iobis / PacMAN-pipeline

Bioinformatics pipeline for the analysis of amplicon sequencing data of eDNA samples from the PacMAN project
MIT License
9 stars 1 forks source link

High level RDP classification despite good BLAST hits #87

Open pieterprovoost opened 12 months ago

pieterprovoost commented 12 months ago

In some cases VEARCH/BLAST return good and consistent hits (>97%) but RDP only classifies at a very high taxonomic level. Could be due to inconsistencies in the reference database taxonomy, or missing reference sequences.

For example (COI eDNA Expeditions sample S176):

3241    asv.3241    Eukaryota   Identification based on the RDP classifier at the confidence level 0.6: taxonomy Eukaryota;undef_Eukaryota;Cercozoa;Chlorarachniophyceae;undef_Chlorarachniophyceae;undef_undef_Chlorarachniophyceae;Chlorarachnion;Chlorarachnion_reptans, confidences 0.99;0.45;0.16;0.16;0.16;0.16;0.16;0.16. Confirmation with VSEARCH against the COI_ncbi_1_50000 database at 0.97 similarity: hits GQ896380, identities 97.1, taxonomy Bigelowiella_natans, consensus Eukaryota;Cercozoa;Chlorarachniophyceae;Bigelowiella;Bigelowiella_natans.

Input:

S1,/home/ubuntu/data/raw_sequences/eDNAexpeditions/batch1/concatenated/GC142030_TGCCGGTCAG-TTGTATCAGG_S176_R1.fastq.gz,forward
S1,/home/ubuntu/data/raw_sequences/eDNAexpeditions/batch1/concatenated/GC142030_TGCCGGTCAG-TTGTATCAGG_S176_R2.fastq.gz,reverse