This is an issue if using maxSpotId to make sure no more than N spots are downloaded (e.g., if there are some very large RNA-Seq experiments I want to ignore).
For example, in this case there are only 5.4 million spots so the third thread does not do anything.
This makes the download slower than not using -X 10000000.
$ parallel-fastq-dump -X 10000000 -t 3 -s SRR868679
SRR ids: ['SRR868679']
extra args: []
tempdir: /tmp/pfd_k2htn18j
SRR868679 spots: 5487730
blocks: [[1, 3333333], [3333334, 6666666], [6666667, 10000000]]
Read 2154397 spots for SRR868679
Written 2154397 spots for SRR868679
Read 3333333 spots for SRR868679
Written 3333333 spots for SRR868679
I believe the fix is just:
end = min(n_spots, args.maxSpotId) if args.maxSpotId is not None else n_spots
This is an issue if using maxSpotId to make sure no more than N spots are downloaded (e.g., if there are some very large RNA-Seq experiments I want to ignore).
For example, in this case there are only 5.4 million spots so the third thread does not do anything. This makes the download slower than not using
-X 10000000
.I believe the fix is just:
Thanks for the useful tool!