isugifNF / polishCLR

A nextflow pipeline for polishing CLR assemblies
https://isugifnf.github.io/polishCLR/
16 stars 4 forks source link

Using parallel to speed up grep #11

Closed molikd closed 2 years ago

molikd commented 3 years ago

This pull request, when tested and ready, will use GNU parallel (already in the conda environment) to multithread the greps for the vcf files, it mutlithreads at the pipe level instead of at the file level (so it splits vcf files into chunks of lines and greps those simultaneously). GNU Parallel has a nice feature of not interrupting line writes.

molikd commented 3 years ago

This should be in working order now.

j23414 commented 2 years ago

Sorry I couldn't merge this earlier. Using xargs was a good suggestion, the only issue is that the vcf headers must maintain the same order or later steps will break.

I'll close this since there's been some major refactoring. I'll be more responsive to pull requests in the future. Happy to discuss.