qiime2 / docs

https://docs.qiime2.org
BSD 3-Clause "New" or "Revised" License
21 stars 58 forks source link

PM tutorial: Taxonomic classification results are the same before and after retraining the classifier #490

Open misialq opened 3 years ago

misialq commented 3 years ago

Bug Description In the course of going through the Parkinson’s Mouse Tutorial, we noticed that when looking at the taxonomic classification results using the classifier retrained with the information about the typical stool sample composition, the species mentioned in the tutorial (B. ovatus) can be found in both sets of results in equal counts. In other words, with the data provided in the tutorial, retraining the classifier does not really improve the classification results with regard to B. ovatus. It seems as if the data originally used to train the first classifier changed in the meantime giving rise to similar results. In this context the tutorial question about the presence of B. ovatus in both results is potentially outdated.

Steps to reproduce the behavior

  1. Open the taxonomy.qzv and bespoke_taxonomy.qzv visualizations from the PM tutorial
  2. Filter the taxon list for "ovatus"
  3. Compare results obtained in both

Expected behavior Not sure, but supposedly the original taxonomy result should have less taxons identified as ovatus?

Actual behavior Both results show the same number of ovatus taxa.

Screenshots from taxonomy.qzv: taxonomy from bespoke_taxonomy.qzv: bespoke_taxonomy

Comments

  1. This is under the assumption that retraining the classifier should improve identification results.
thermokarst commented 3 years ago

@BenKaehler, did you write that part of the PD Mice tutorial? If so, care to comment?

nbokulich commented 3 years ago

To clarify, what changed is that the new uniform (default) pre-trained classifier is using the RESCRIPt-processed greengenes database.

The bespoke classifier is trained using the old (raw) greengenes:

[image: image.png]

So two things need to happen:

  1. the bespoke classifier should be trained on the same data as the uniform classifier
  2. the question needs to be changes to find another taxon that is underclassified by the uniform classifier

On Thu, Sep 24, 2020 at 6:15 PM Matthew Dillon notifications@github.com wrote:

@BenKaehler https://github.com/BenKaehler, did you write that part of the PD Mice tutorial? If so, care to comment?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/qiime2/docs/issues/490#issuecomment-698445061, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAORW3D2UECYY2DAL7XRRQDSHNWDVANCNFSM4RYLKVNA .