Closed RvV1979 closed 1 year ago
The memory options don't affect assembly quality in any way. The statement about 30% degradation refers to performance in terms of assembly time only.
I apologize for lack of clarity of the documentation on this point and I will make some changes.
Thanks for the reassuring clarifications. Just for your information: my worry about degradation also stemmed from the warning from stdout, below:
This run used options "--memoryBacking 4K --memoryMode anonymous".
This could have resulted in performance degradation.
For full performance, use "--memoryBacking 2M --memoryMode filesystem"
(root privilege via sudo required).
Therefore the results of this run should not be used
for benchmarking purposes.
In that case too, "performance" refers to assembly time only. I changed that message this morning to clarify this point. The new wording of the message will be in the next release. Thank you for reporting this - you had a valid point because the term "performance" in genomics is often used to really mean "quality" (in computer science, it typically just means "speed").
The commit with the message change is here.
From the documentation, I understand that for optimal performance access to a single machine with large memory is required. I have access to a shared machine with 755G memory but for obvious reasons do not have root access. Using the default suboptimal memory settings, my assembly of a heterozygous plant genome runs very fast and contig size is good enough for my purposes. However, I want to avoid assembly errors.
Therefore, I would like to ask how the default mode --memoryMode anonymous --memoryBacking 4K affect assembly results? I read "typically 30% degredation" but am unsure what that means, exactly. Will contigs just be shorter, or will there be errors?
I hope you can clarify and give me some advice.
Thanks