I've experienced a memory leak when using the python mappy.Aligner when initialized with a sequence file/FASTA (this is within Megalodon). This same memory leak does not occur when the Aligner is initialized with a minimap2 index. The problem seems exaggerated when the reference sequence is quite large (e.g. human genome) and when the reference is being accessed from many aligners (in different threads) at the same time.
I suspect the memory leak may be due to extensive use of the .seq method of the mappy.Aligner within Megalodon. The reference sequence for each mapping is extracted within Megalodon.
Ideally the source of the memory leak could be identified and fixed, but as a stop gap I was hoping to warn users that using a sequence/fasta reference instead of the minimap2 index could lead to a memory error. When I use mappy.fastx_read function on a minimap index file I get a single "contig" with an empty string for the contig name and sequence (list(mappy.fastx_read('ref.fa.mmi') gives: [('', '', None)]). I could check that this sequence is empty to determine if the input file was a FASTA or minimap2 index, but I was wondering if there might be a more robust way to check this?
Thanks for any input and especially for continued development on this project!
I've experienced a memory leak when using the python
mappy.Aligner
when initialized with a sequence file/FASTA (this is within Megalodon). This same memory leak does not occur when theAligner
is initialized with a minimap2 index. The problem seems exaggerated when the reference sequence is quite large (e.g. human genome) and when the reference is being accessed from many aligners (in different threads) at the same time.I suspect the memory leak may be due to extensive use of the
.seq
method of themappy.Aligner
within Megalodon. The reference sequence for each mapping is extracted within Megalodon.Ideally the source of the memory leak could be identified and fixed, but as a stop gap I was hoping to warn users that using a sequence/fasta reference instead of the minimap2 index could lead to a memory error. When I use
mappy.fastx_read
function on a minimap index file I get a single "contig" with an empty string for the contig name and sequence (list(mappy.fastx_read('ref.fa.mmi')
gives:[('', '', None)]
). I could check that this sequence is empty to determine if the input file was a FASTA or minimap2 index, but I was wondering if there might be a more robust way to check this?Thanks for any input and especially for continued development on this project!