Open tobiasgf opened 10 months ago
Maybe essential reading on choices for ref-db usage/construction/evaluation.
Keck et al 2023 - Navigating the seven challenges of taxonomic reference databases in metabarcoding analyses. Mugnai et al 2023 – Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies
A list of tool that can be considered if GBIF considers to produce own reference databases.
RESCRIPt Reproducible sequence taxonomy reference database management
rCRUX – A Rapid and Versatile Tool for Generating Metabarcoding Reference libraries in R. Notes: Apparently the best performing algorithm presently. Based on in silico PCR followed by similarity searches. Part of the ANACAPA tool kit. Used by CALeDNA to build ref-db's.
crabs– A software program to generate curated reference databases for metabarcoding sequencing data
METACURATOR – A hidden Markov model-based toolkit for extracting and curating sequences from taxonomically-informative genetic markers
ECOPCR Notes: Originally part of the ObiTools tool set. I am unsure about recent developments. But it had the problem of not catching sequences that lack primer region (as other approaches, but these are followed up by similarity searches).
DB4Q2 – A detailed workflow to develop QIIME2‑formatted reference databases for taxonomic analysis of DNA metabarcoding data Notes: A workflow for Qiime2
MARES – a replicable pipeline and curated reference database for marine eukaryote metabarcoding
refdb – Management of DNA reference libraries for barcoding and metabarcoding studies with the R package refdb Notes: maybe something that can be used to curate ref-dbs produced with any tool?
mkcoinr – COInr and mkCOInr: Building and customizing a nonredundant barcoding reference database from BOLD and NCBI using a semi-automated pipeline "The mkcoinr tool is a series of Perl scripts designed to download sequences from BOLD and NCBI, to build the COInr database and to customize it according to the users’ needs. It is possible to select or eliminate sequences for a list of taxa, select a specific gene region, select for minimum taxonomic resolution, add new custom sequences, and format the database for blast, vtam, qiime and rdp classifier."