I have been creating mock community samples using seqtk sample on some single isolate inputs, something like this:
rm -rf tempR1.fastq tempR2.fastq
for sample in A B C; do
seqtk sample -s 123 input${sample}_R1.fastq.gz 10000 >> tempR1.fastq
seqtk sample -s 123 input${sample}_R2.fastq.gz 10000 >> tempR2.fastq
done
gzip tempR1.fastq
gzip tempR2.fastq
In this example my combined FASTQ files will have the reads from sample A, then sample B, and finally sample C - and this ordering may introduce biases in the downstream analysis.
What I would like to do is finish with something like this:
Here I am assuming -s would set the random number seed as used in seqtk sample to ensure that both R1 and R2 are randomised in the same way, and the output remains nicely paired.
I have been creating mock community samples using
seqtk sample
on some single isolate inputs, something like this:In this example my combined FASTQ files will have the reads from sample A, then sample B, and finally sample C - and this ordering may introduce biases in the downstream analysis.
What I would like to do is finish with something like this:
Here I am assuming
-s
would set the random number seed as used inseqtk sample
to ensure that both R1 and R2 are randomised in the same way, and the output remains nicely paired.