Closed antoine4ucsd closed 1 year ago
What's the size of the original file? And also, check the number of reads with seqkit stats.
seqkit stats -j 10 *.gz
PS: The command below does not output gzip format.
seqtk sample -s100 BA922J_barcode16_run5_merged.fastq.gz 10000 > BA922J_10000.gz
# this does
seqtk sample -s100 BA922J_barcode16_run5_merged.fastq.gz 10000 | pigz -c > BA922J_10000.gz
thank you. good catch for the typo in the cmd line.
Hello I am trying to subsample fastq.gz file but not sure if it really works as expected above a given limit.
my source file contains 150k reads
but when trying to subset:
then the file size is plateauing...
also need to make sure this is not resampling the same reads. can you confirm (for example if I set the sample to 200k)
not sure what I am doing wrong... thank you!