I am building a tool that needs to randomly access sequence and qualities of various read ids from a large fastq file (~350GB) and the index for that file is ~1GB. It's a multi threaded program and I have roughly the following structure
void thread_fn() {
faidx_t* index = ... // create index for fastq file
while (true) {
// logic
// use faidx index APIs fai_fetch and fai_fetchqual to get sequence and qualities from random read ids
// more logic
if (end_condition) {
break;
}
}
fai_destroy(index);
}
This thread function is run is 12-16 threads.
However I'm noticing a huge memory footprint when I use faidx. Going up to >300 GB and then eating in swap space before killing the process.
I am freeing the seq and qual pointers returned by the APIs as well. What could be causing this? Am I using an anti-pattern for faidx API? Is there any way to limit the faidx memory footprint?
My apologies, the issue seemed to have been another part of the code that was causing a lot of reads to be read which was causing memory to balloon up. Nothing on faidx side.
Hello,
I am building a tool that needs to randomly access sequence and qualities of various read ids from a large fastq file (~350GB) and the index for that file is ~1GB. It's a multi threaded program and I have roughly the following structure
This thread function is run is 12-16 threads.
However I'm noticing a huge memory footprint when I use faidx. Going up to >300 GB and then eating in swap space before killing the process.
I am
free
ing the seq and qual pointers returned by the APIs as well. What could be causing this? Am I using an anti-pattern for faidx API? Is there any way to limit the faidx memory footprint?