thomasstjerne / blast-ws

MIT License
0 stars 0 forks source link

Methods for constructing reference databases #12

Open tobiasgf opened 10 months ago

tobiasgf commented 10 months ago

A list of tool that can be considered if GBIF considers to produce own reference databases.

RESCRIPt Reproducible sequence taxonomy reference database management

rCRUXA Rapid and Versatile Tool for Generating Metabarcoding Reference libraries in R. Notes: Apparently the best performing algorithm presently. Based on in silico PCR followed by similarity searches. Part of the ANACAPA tool kit. Used by CALeDNA to build ref-db's.

crabsA software program to generate curated reference databases for metabarcoding sequencing data

METACURATORA hidden Markov model-based toolkit for extracting and curating sequences from taxonomically-informative genetic markers

ECOPCR Notes: Originally part of the ObiTools tool set. I am unsure about recent developments. But it had the problem of not catching sequences that lack primer region (as other approaches, but these are followed up by similarity searches).

DB4Q2A detailed workflow to develop QIIME2‑formatted reference databases for taxonomic analysis of DNA metabarcoding data Notes: A workflow for Qiime2

MARESa replicable pipeline and curated reference database for marine eukaryote metabarcoding

refdbManagement of DNA reference libraries for barcoding and metabarcoding studies with the R package refdb Notes: maybe something that can be used to curate ref-dbs produced with any tool?

mkcoinrCOInr and mkCOInr: Building and customizing a nonredundant barcoding reference database from BOLD and NCBI using a semi-automated pipeline "The mkcoinr tool is a series of Perl scripts designed to download sequences from BOLD and NCBI, to build the COInr database and to customize it according to the users’ needs. It is possible to select or eliminate sequences for a list of taxa, select a specific gene region, select for minimum taxonomic resolution, add new custom sequences, and format the database for blast, vtam, qiime and rdp classifier."

tobiasgf commented 10 months ago

Maybe essential reading on choices for ref-db usage/construction/evaluation.

Keck et al 2023 - Navigating the seven challenges of taxonomic reference databases in metabarcoding analyses. Mugnai et al 2023Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies