mhoban / rainbow_bridge

GNU General Public License v3.0
5 stars 2 forks source link

Improve parallelism of `unzip` process #93

Closed mhoban closed 2 months ago

mhoban commented 2 months ago

The unzip process currently uses a for loop internally for paired-end reads. This could be improved using transpose and groupTuple. The reads channel looks like this:

[ key1, [f1, f2] ]
[ key2, [f1, f2] ]
[ key3, [f1, f2] ]
...

transpose will turn it into this:

[ key1, f1 ]
[ key1, f2 ]
[ key2, f1 ]
[ key2, f2 ]
...

Which can be passed to unzip so that all reads are unzipped in parallel, and then regrouped into the original format using groupTuple.

mhoban commented 2 months ago

done in 3e14dc151c0fac88eb0c545ae04361f8f3b69ee6