This PR allows you to request a subsample of the data.
If you have very large sequence files, you may not want to process them all, or be able to process all the reads, so you can "subsample" the data.
Note that the approach taken is to take the first n reads, where n is provided by the --subsample option. This was chosen rather than a random subsample of the reads so that if you have R1 and R2 files you should end up with the correct paired reads in each output file (also it is easier to implement).
If you request a --subsample larger than your sequence file, you will get all the sequences.
The subsampled temporary file is written to the temporary directory which is cleaned up before exiting, and thus is not saved.
This PR allows you to request a subsample of the data.
If you have very large sequence files, you may not want to process them all, or be able to process all the reads, so you can "subsample" the data.
Note that the approach taken is to take the first
n
reads, wheren
is provided by the--subsample
option. This was chosen rather than a random subsample of the reads so that if you haveR1
andR2
files you should end up with the correct paired reads in each output file (also it is easier to implement).If you request a
--subsample
larger than your sequence file, you will get all the sequences.The subsampled temporary file is written to the temporary directory which is cleaned up before exiting, and thus is not saved.