seandavi / MicroBioMap

https://seandavi.github.io/MicroBioMap/
Other
3 stars 3 forks source link

Taxonomic levels Genus names at Family rank #13

Open microsud opened 10 months ago

microsud commented 10 months ago

Dear Authors, Congrats on this huge task and thanks for creating this resource. When exploring the data at family level, I noticed an issue regarding taxonomy that was highlighted couple of year back in the dada2 silva database. I am not sure if the source database used in this study is from this repository but there seems to be the same issue. https://github.com/mikemc/dada2-reference-databases/issues/1

# BiocManager::install('seandavi/MicroBioMap')
library(MicroBioMap)
library(mia)
library(dplyr)
cpd <- getCompendium()
saveRDS(cpd, 'MicroBioMapDataTSE.rds')

vcpd <- mia::subsetByPrevalentFeatures(cpd, 
                                       detection=0.0001, 
                                       prevalence=0.01,
                                       as_relative = TRUE)

dplyr::count(as.data.frame(rowData(vcpd)), family)

examples of genus level names at family level are Anaerococcus, Ezakiella, Finegoldia, Parvimonas, Peptoniphilus etc

Best wishes, Sudarshan

rabdill commented 7 months ago

Hi Sudarshan, thank you for bringing this to our attention. We did use SILVA v138.1 in the original pipeline, but you're right that we hadn't accounted for the additional "bad taxa." We've made a new data release (v1.0.1) that fixes these by putting the existing names at the proper level. We filled in the gaps with "(unclassified)" rather than filling in additional taxonomic information that in some cases conflicted with other SILVA info. It's available on Zenodo and will be integrated into the next package release.