Closed hoelzer closed 4 years ago
Ah yes, that's a good solution!
MarieLataretu notifications@github.com schrieb am Mi., 20. Mai 2020, 19:55:
I'd add an extra parameter --illumina-single-end (like --nano and --illumina), so that one can clean single- and paired-end reads in one clean run
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/hoelzer/clean/issues/4#issuecomment-631630595, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADN2CZ6RIQCSXNXUBPOBRKDRSQKQ7ANCNFSM4KLFRFNQ .
I just scrolled by - Is the renaming of the reads applicable also for the single-end reads?
renaming of the reads? do you have an example?
We do this, before mapping:
# this is working for ENA reads that have at the end of a read id '/1' or '/2'
EXAMPLE_ID=\$(zcat ${reads[0]} | head -1)
if [[ \$EXAMPLE_ID == */1 ]]; then
if [[ ${reads[0]} =~ \\.gz\$ ]]; then
zcat ${reads[0]} | sed 's/ /DECONTAMINATE/g' > ${name}.R1.id.fastq
TOTALREADS_1=\$(zcat ${reads[0]} | echo \$((`wc -l`/4)))
else
sed 's/ /DECONTAMINATE/g' ${reads[0]} > ${name}.R1.id.fastq
TOTALREADS_1=\$(cat ${reads[0]} | echo \$((`wc -l`/4)))
fi
if [[ ${reads[1]} =~ \\.gz\$ ]]; then
zcat ${reads[1]} | sed 's/ /DECONTAMINATE/g' > ${name}.R2.id.fastq
TOTALREADS_2=\$(zcat ${reads[1]} | echo \$((`wc -l`/4)))
else
sed 's/ /DECONTAMINATE/g' ${reads[1]} > ${name}.R2.id.fastq
TOTALREADS_2=\$(cat ${reads[1]} | echo \$((`wc -l`/4)))
fi
else
[....]```
But I just saw, that we also do this for the ONT data, so I'll implement this also for the Illumina singe-end data!
Ah sorry, I got confused with the rnaseq pipeline ;)
Yeah, I introduced this renaming stuff because I experienced problems with some FASTQ headers. I think what we could also have is a more convenient renaming Python script or so that
see: https://github.com/EBI-Metagenomics/emg-viral-pipeline/blob/master/bin/rename_fasta.py
So we could have a separate rename
step for any FASTQ, then the filtering happens, and then we have a restore
module...
maybe that's cleaner?
But I am also happy with any other simple solution
yeah, an extra process for renaming would definitely reduce code redundancy!
I'll go for the copy-paste solution for the moment and open a new issue
I'd add an extra parameter
--illumina-single-end
(like--nano
and--illumina
), so that one can clean single- and paired-end reads in oneclean
run