Open standage opened 7 years ago
After having run the pipeline a few more time, I'm less confident in the first bullet point now. If there were a way to write N banded counttables to disk with a single pass over the data, it could potentially make a big difference.
Suggestion from @drtamermansour: in a single pass, write count tables to N files (one for each band) in a single pass. Then running
kevlar find
in N bands would not require N passes over the entire data set, just loading the count tables from disk N times.I just wanted to capture this suggestion, I have some concerns and I'm not sure it would yield much benefit.
kevlar find
. Loading from count table files rather than the Fastq files again probably won't make a huge difference in overall runtime.And in any case, this is all optimization: there's still work to do to get reliable results first!
[1] There are ways we could investigate to do this in a streaming fashion, but for now I'm happy with saying we have to do a second pass over the reads. :-)