Closed genomewalker closed 5 years ago
Hi Antonio The files from the previous steps are important in case something goes wrong. Then pass can be restarted and continue where it left of previously. One thing that we are working currently in MMseqs2 and thus will end up in Plass very soon is compressed databases, which should also help for this issue. A more general solution will take some time since we will want to introduce this feature to all workflows (of MMseqs2 and Plass) at the same time. Best regards Milot
Hi Milot compressed databases sound awesome! I look forward to them! Compressed DBs will be very useful for the mapping step as well; the prefiltering DBs are huge as well :-)
By now I will take the risk to remove the previous steps files, our nodes have a limited scratch space of 2TB and several assemblies die because of the lack of space.
Many thanks! Antonio
I've been having the same issue, where a large but not massive dataset needs more than 4Tb of tmp space, regardless if I'm running a coassembly or doing it by sample. @genomewalker did your temporary fix work correctly?
Yes, it does solve the problem but as @milot-mirdita pointed you will not be able to restart if something fails.
Plass removes temporary files now on the fly. This should roughly reduce the hard disk consumption by a factor of 12.
Hi Martin it would be possible that PLASS has an option to remove the intermediate files (i.e.
pref_
,aln_
,assembly_
) of the iterations that are not going to be used anymore in the following steps. For some of the assemblies, the disk usage explodes and goes up to several terabytes. As a temporal solution I added to assembler.sh the following lines to remove the files from previous steps:Many thanks Antonio