Open savvas-paragkamian opened 11 months ago
Hi @savvas-paragkamian. Thanks for the points.
The all_samples.fasta
should be removed from the top output folder.
In general, a feature could be added so files that are not being used from a step and afterwards could be removed on the fly.
At the moment pema returns everything so the user can validate the filtering parameters and their affect.
However, it might be a good option to remove intermediate files optionally for such cases.
This is more of a question of how PEMA uses storage for each run. For my project I have 140 samples with PE sequences resulting to 14 gb of data.
Is possible to reduce the storage needed for a run of PEMA or all output is required?
For example I have 2 all_samples.fasta (one in mainOutput and one in PEMA folder) files and 1 final_all_samples.fasta, are all necessary?
Also some intermediate folders like
linearizedSequences
,mergedSequences
take up similar space as themydata
folder.The reason for this issue is that in large scale projects this can lead to exceeding disk quota.