Open DLBPointon opened 3 months ago
if fasta size > 10G, then split the fasta.gz into N chunks, N= round( size_of_fasta/10) pyfasta split -n N {sample}.fasta.gz
@yumisims @DLBPointon. Maybe use https://nf-co.re/modules/seqkit_split2 ?
or just zcat {sample}.fasta.gz | awk '/^>/{n++} { print > ("chunk_" int(n/N) ".fasta") }' let's see
seqkit split2
is multithreaded and will output gzipped chunks
Description of feature
The size of the revio data is huge, this needs to be split into n = (reads / 10million) files. Mapping and then merge the output.