Open a4000 opened 1 year ago
All taxonomic classification subworkflows (see here) use ASV sequence tables (fasta) after the various filtering steps (see e.g. DADA2). In any case, a subworkflow such as here would be great. I propose to use LCA after blastn (or as Daniel Lundin proposed rather vsearch) in a separate taxonomy assignment subworkflow. The output of that subworkflow would be the downstream-compatible taxonomic classification.
A subworkflow sounds like a good idea, thanks
Description of feature
I can add a module to Amliseq that would run the LCA scripts from eDNAFlow. More information can be found here: https://github.com/mahsa-mousavi/eDNAFlow#lca-lowest-common-ancestor-script-for-assigning-taxonomy
The input to the module would be the output from blastn, so this module might not work if the user doesn't use blastn. The other input file for this module would be the DADA2_table.tsv file, or alternatively, the curated table produced by LULU if the user chose to use LULU.
The main output file for this module is a tsv file that contains the same information as the input ASV tsv file, plus the number of unique blast hits, and the various taxonomy levels assigned to the ASV (with a "dropped" value in each level where an ASV didn't meet certain thresholds). It's possible this output file may need to be modified to be more compatible with downstream steps in Ampliseq.