Closed dalilasss closed 4 years ago
pipe to gzip
Hallo! Just to clarify: I can use a fastq.gz file as input for seqtk sample, but the output is in .fastq format?
Thank you!
That's what @lh3 meant: seqtk sample -s seed=75 SRR7.fastq.gz 0.8 | gzip > SRR7.fastq.gz
Btw, for me, if I do not rename the output gzipped file to something new, it clobbers the original file with an empty file. e.g., if I use:
seqtk seq -L 50 300_S300_L001_R1_001.fastq.gz | gzip > 300_S300_L001_R1_001.fastq.gz
the final fastq.gz file is empty. However, if I use:
seqtk seq -L 50 300_S300_L001_R1_001.fastq.gz | gzip > 300_S300_L001_R1_001.fastq_noSmalls.gz
I'm using seqtk 1.3-r106
the new file (300_S300_L001_R1_001.fastq_noSmalls.gz
) contains desired reads.
seqtk seq -L 50 300_S300_L001_R1_001.fastq.gz | gzip > 300_S300_L001_R1_001.fastq.gz
You overwrote the input file, so it became empty. This is a dangerous operation. The correct way is 1) write the filtered data to a new file, 2) rename the new file to the old file.
But I would choose to keep the original files, for safety.
@shenwei356 Agree. I'm just noting this because @fconstancias's solution as of this writing does the same - it clobbers the original file. I thought other folks should know.
I wanted to know if it is possible to use seqtk sample on a zipped file .fastq.gz and have the output also in the same .fastq.gz format? The issue is that I tried it zith the command
[guest@u]$ seqtk sample -s seed=75 SRR7.fastq.gz 0.8 > SRR7.fastq.gz
with the input file of size 6.3 Gb , and the output size is 16.8 Gb and when i use zcat command it doesn't recognize the format, so I believe the output is a regular fastq format The number of sequences is correct in the output file.Could you please tell if there is a way to change it? Thank you!