nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
187 stars 117 forks source link

Add sintax support for all available databases #618

Open jtangrot opened 1 year ago

jtangrot commented 1 year ago

Description of feature

I suggest to make it possible to run sintax instead of assignTaxonomy using all taxonomic databases currently supported by ampliseq. It should only be a matter of reformatting the headers in the fasta files, according to the description in the vsearch manual: The reference database must contain taxonomic information in the header of each sequence in the form of a string starting with ";tax=" and followed by a comma-separated list of up to eight taxonomic identifiers. Each taxonomic identifier must start with an indication of the rank by one of the letters d (for domain), k (kingdom), p (phylum), c (class), o (order), f (family), g (genus), or s (species). The letter is followed by a colon (:) and the name of that rank. Commas and semicolons are not allowed in the name of the rank. Example: ">X80725_S000004313;tax=d:Bacteria,p:Proteobacteria,c:Gammaproteobacteria, o:Enterobacteriales,f:Enterobacteriaceae,g:Escherichia/Shigella,s:Escherichia_coli".

erikrikarddaniel commented 1 year ago

Why not?! :+1: