Open clementhelsens opened 1 year ago
I dont think minimap2 should have a memory problem with dups. It deals with each read separately. If this is a human or model organism you could skip remapping and use a common variants file —skip_remap TRUE and —common_variants filename
For human i provide a common variants file on the github page
Thanks @wheaton5 ! sorry for not providing more details, I thought those were not relevant here. I'm working with zebrafish, and I have a common variant file available. But I do not want to skip remapping because we would like to be more inclusive in our gene expressions. I was able to find a workaround reading this issue https://github.com/lh3/minimap2/issues/764 by also allocating a large swap space. It worked well for our sequencing samples processed with Cellranger , but for some public data we need to process (as a benchmark) I just can not do the same. I pushed the swap to more than 400GB so ~525GB total memory and still no luck with only one of the 20 threads. My hope was that I could find a solution removing the duplicated reads, but your comment seems to indicate the opposite.
that minimap issue is mapping long contigs, not short reads. Not sure what the memory blow up is there, but I haven't seen this with minimap on short reads.
hello @wheaton5, I posted an issue on minimap2 here and following the suggestion of trying the latest version everything worked fine (version 2.26). I was using version 2.17, and I see that your singularity is using 2.7, which is even older. Is there a reason not to move to more recent versions?
I can update to a new minimap2 version. I will test and get back to you.
There seem to have been many modifications since version 2.0 in 2019. Can we expect a new version soon? It would improve clarity and research reproducibility if a precise version could be stated in a Methods section of a research article, for instance.
Yes, I am trying to get hisat2 to work in a singularity container. Ive been very busy but now have time for this. And ill make sure to update minimap2 at the same time. You could of course do a manual install and give the got commit as the version for the purposes of publications but I understand that isnt ideal.
ok i have a new image, now trying to figure out how to update singularity hub. for now you can build your own image from souporcell.def, but you need sudo
Hello! during the alignment process with
minimap2
I suffer from a very large memory consumption for only one of the 20 threads I use. I investigated what could be different in this thread and the only thing I found is that for this particular thread, there is twice more bases with large number of duplicated reads then usual. The plots attached shows the number of reads per base. For the left one we have ~20k base with ~8000 reads, but we have ~45k for the right plot which is the problematic thread. If the memory problem is caused by duplicated reads, would you have suggestions how to mitigate it? If this is unlikely to be the problem, would have have ideas to find what it is? Many thanks in advance!