Storage usage of PEMA - Githubissues

hariszaf / pema

PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S rRNA, ITS and COI marker genes

27 stars 12 forks source link

This is more of a question of how PEMA uses storage for each run. For my project I have 140 samples with PE sequences resulting to 14 gb of data.

14G ./my data
196G /pema215_otu

Is possible to reduce the storage needed for a run of PEMA or all output is required?

For example I have 2 all_samples.fasta (one in mainOutput and one in PEMA folder) files and 1 final_all_samples.fasta, are all necessary?

Also some intermediate folders like linearizedSequences, mergedSequences take up similar space as the mydata folder.

The reason for this issue is that in large scale projects this can lead to exceeding disk quota.

hariszaf / pema