jhuapl-bio / taxtriage

TaxTriage is a Nextflow workflow designed to agnostically identify and classify microbial organisms within short- or long-read metagenomic NGS data. This flexible tool was developed with various use-cases of mNGS in mind.
MIT License
18 stars 4 forks source link

Improvement: Speed of SPLIT_VCF #20

Closed Merritt-Brian closed 11 months ago

Merritt-Brian commented 1 year ago

Description of feature

bin/split_vcf.py decompresses the singulary channel output vcf.gz file, then splits them on the kraken taxid in the accession columnd (second index of split on "|") into individual files. Python3 is too slow, consider changing to gawk

Merritt-Brian commented 11 months ago

Tested with gawk and improvements are negligible. Python3 allows more flexibility so remaining with that path