meglecz / mkCOInr

Make a non-redundant, comprehensive COI database from NCBI and BOLD and include customizing options
MIT License
12 stars 2 forks source link

mkCOInr - COInr

mkCOInr is a series of Perl scripts that aims to create and customize COInr, a large, comprehensive, COI database form NCBI-nucleotide and BOLD.

The COInr database is composed of two files

COInr is freely available and can be easily downloaded at DOI

Documentation

The mkCOInr documentation is hosted at ReadTheDocs.

COInr

The database can be used directly for similarity-based taxonomic assignations of metabarcoding data with any COI marker (primer pairs) of any geographical regions or target group.

Alternatively, the database can be used as a starting point to create smaller, more specific custom databases.

mkCOInr

Sequences from COInr can be selected for :

This can save a considerable amount of time and effort, since one of the most important challenges of creating a custom database is the mass downloading of the sequences and their pooling into a coherent taxonomic system.

Custom sequences can also be included and their taxonomic lineages is correctly handled.

COInr or the custom databases derived from it can be formated to different database formats (qiime, rdp, blast, vtam, full).

Fig.1 The full pipeline to create COInr and options to make a custom database Fig1. Flowchart

Major features of the creation of COInr

Warning

Citation

Paper describing COInr and mkCOInr: Meglécz, E. (2023). COInr and mkCOInr: Building and customizing a non-redundant barcoding reference database from BOLD and NCBI using a semi-automated pipeline. Molecular Ecology Resources, 21, 933-945

COInr database: Meglécz, E. (2022). COInr a comprehensive, non-redundant COI database from NCBI-nt and BOLD [Data set]. Zenodo

mkCOInr: github.com/meglecz/mkCOInr