jhuapl-bio / taxtriage

TaxTriage is a Nextflow workflow designed to agnostically identify and classify microbial organisms within short- or long-read metagenomic NGS data. This flexible tool was developed with various use-cases of mNGS in mind.
MIT License
18 stars 4 forks source link

Use local Refseq mirror for post-kraken2 accession querying #61

Open Merritt-Brian opened 4 months ago

Merritt-Brian commented 4 months ago

Description of feature

Currently, you have to either pull (internet access required) taxa from NCBI post-kraken2, manually pull (also internet-requiring) AND/OR skip kraken2 and use a local reference FASTA file. However, some would want to just use a full Refseq set of FASTA files (many directories/subdirectories) and extract those for the realignment step.

Considerations: Read the headers of all files is too unwieldy. Consider supplying a mapping file of accession to file OR extract accession from the filename itself like GCF....fasta