jbisanz / qiime2R

Import qiime2 artifacts to R
MIT License
156 stars 53 forks source link

Issues with Taxonomy Parsing #23

Closed cedwardson4 closed 4 years ago

cedwardson4 commented 4 years ago

I am having an issue with parsing taxonomy and moving taxonomy into phyloseq when manually creating a phyloseq object. This is possible an issue with read_qza(), because everything seems to work correctly with qza_to_phyloseq().

Using the QIIME2 Moving Pictures Tutorial taxonomy data (taxonomy.qza)

test_taxonomy<-read_qza("taxonomy.qza")
parse_taxonomy(test_taxonomy)

The following error occurs:

Error in parse_taxonomy(test_taxonomy) : Table does not match expected format (colnames(obj) are Feature.ID, Taxon, (Confidence OR Consensus))

Attempting to create a manual phyloseq object, with my data that worked with qza_to_phyloseq():

WildCombined_table<-read_qza("WildCombined_table.qza")
WildCombined_IQtree<-read_qza("WildCombined_IQ_rooted-tree.qza")
WildCombined_taxonomy<-read_qza("WildCombined_SILVA138_taxonomy.qza"):
WildCombined_metadata<-phyloseq::import_qiime_sample_data("../../WildCombinedMapping.txt")

WildCombined_ps_manual<-phyloseq(otu_table(WildCombined_table$data,taxa_are_rows=TRUE), phy_tree(WildCombined_IQtree$data),tax_table(WildCombined_taxonomy$data), sample_data(WildCombined_metadata))

The following error occurs:

Error in validObject(.Object) : invalid class “phyloseq” object: Component taxa/OTU names do not match. Taxa indices are critical to analysis. Try taxa_names() In addition: Warning message: In .local(object) : Coercing from data.frame class to character matrix prior to building taxonomyTable. This could introduce artifacts. Check your taxonomyTable, or coerce to matrix manually.

I can confirm this error does not occur if I exclude the tax_table from the phyloseq object creation.

Versions: QIIME version 2020.6 phyloseq_1.32.0 qiime2R_0.99.34

jbisanz commented 4 years ago

Hi Christian,

The input to parse_taxonomy needs to be test_taxonomy$data. I see the documentation for parse_taxonomy is misleading and I will fix this.

cedwardson4 commented 4 years ago

Thank you for the quick response - this fixes my first question, but I still can't get the taxonomy into a phyloseq object using read_qzv.

Example

WildCombined_taxonomy<-read_qza("WildCombined_SILVA138_taxonomy.qza"):

 head(WildCombined_taxonomy$data)
                        Feature.ID
1 00076a533f2eb03c0e8a4eba2372b958
2 0013aa6b5bbe1c247c7ba5bde700a7e2
3 0021bdba24727fb0707d2320e2bdadd9
4 003551aea4dd0328648ec94036a478d9
5 0036f49c34a351848d78623cc8548120
6 0040b3e7f7027ffc55c6030f0c029389
                                                                                                                                                                        Taxon
1                                                    d__Bacteria; p__Firmicutes; c__Clostridia; o__Lachnospirales; f__Lachnospiraceae; g__uncultured; s__uncultured_bacterium
2                                                                                            d__Bacteria; p__Firmicutes; c__Clostridia; o__Lachnospirales; f__Lachnospiraceae
3 d__Bacteria; p__Firmicutes; c__Clostridia; o__Oscillospirales; f__[Eubacterium]_coprostanoligenes_group; g__[Eubacterium]_coprostanoligenes_group; s__uncultured_Clostridia
4                                                     d__Bacteria; p__Bacteroidota; c__Bacteroidia; o__Bacteroidales; f__Rikenellaceae; g__Alistipes; s__uncultured_bacterium
5                                       d__Bacteria; p__Firmicutes; c__Clostridia; o__Lachnospirales; f__Lachnospiraceae; g__Lachnospiraceae_UCG-006; s__uncultured_bacterium
6                                                                                  d__Bacteria; p__Firmicutes; c__Bacilli; o__RF39; f__RF39; g__RF39; s__uncultured_bacterium
  Confidence
1  0.7399832
2  0.9962021
3  0.9815547
4  0.9929343
5  0.8294032
6  0.8038206

 head(phyloseq(tax_table((WildCombined_taxonomy$data)))
+ )
Taxonomy Table:     [6 taxa by 3 taxonomic ranks]:
    ta1                               
sp1 "00076a533f2eb03c0e8a4eba2372b958"
sp2 "0013aa6b5bbe1c247c7ba5bde700a7e2"
sp3 "0021bdba24727fb0707d2320e2bdadd9"
sp4 "003551aea4dd0328648ec94036a478d9"
sp5 "0036f49c34a351848d78623cc8548120"
sp6 "0040b3e7f7027ffc55c6030f0c029389"
    ta2                                                                                                                                                                          
sp1 "d__Bacteria; p__Firmicutes; c__Clostridia; o__Lachnospirales; f__Lachnospiraceae; g__uncultured; s__uncultured_bacterium"                                                   
sp2 "d__Bacteria; p__Firmicutes; c__Clostridia; o__Lachnospirales; f__Lachnospiraceae"                                                                                           
sp3 "d__Bacteria; p__Firmicutes; c__Clostridia; o__Oscillospirales; f__[Eubacterium]_coprostanoligenes_group; g__[Eubacterium]_coprostanoligenes_group; s__uncultured_Clostridia"
sp4 "d__Bacteria; p__Bacteroidota; c__Bacteroidia; o__Bacteroidales; f__Rikenellaceae; g__Alistipes; s__uncultured_bacterium"                                                    
sp5 "d__Bacteria; p__Firmicutes; c__Clostridia; o__Lachnospirales; f__Lachnospiraceae; g__Lachnospiraceae_UCG-006; s__uncultured_bacterium"                                      
sp6 "d__Bacteria; p__Firmicutes; c__Bacilli; o__RF39; f__RF39; g__RF39; s__uncultured_bacterium"                                                                                 
    ta3        
sp1 "0.7399832"
sp2 "0.9962021"
sp3 "0.9815547"
sp4 "0.9929343"
sp5 "0.8294032"
sp6 "0.8038206"
Warning message:
In .local(object) : Coercing from data.frame class to character matrix 
prior to building taxonomyTable. 
This could introduce artifacts. 
Check your taxonomyTable, or coerce to matrix manually.

Maybe this is a phyloseq issue?

jbisanz commented 4 years ago

Hi Christian,

Phyloseq is going to want the taxonomy parsed into the separate columns at the time of object construction. Something like this would work:

WildCombined_taxonomy<-read_qza("WildCombined_SILVA138_taxonomy.qza")$data %>% parse_taxonomy()
head(phyloseq(tax_table((WildCombined_taxonomy)))
cedwardson4 commented 4 years ago

Thanks for the help. I also just came across a helpful bit of code in the QIIME2 forum that works for manually creating a phyloseq object that may be useful for anyone reading this thread:

# Make a phyloseq object in R
metadata <- read_tsv("metadata.tsv", comment = "#q2:type") 

table <- read_qza("table.qza")
count_tab <- table$data %>% as.data.frame() 

taxonomy <- read_qza("taxonomy.qza")
tax_tab <- taxonomy$data %>% 
  as.data.frame() %>%
  separate(Taxon, sep = ";", c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")) %>% 
  column_to_rownames("Feature.ID") %>%
  select(-Confidence)

tree <- read_qza("rooted-tree.qza")

ps <- phyloseq(otu_table(as.matrix(count_tab), taxa_are_rows = T),
               phy_tree(tree$data), 
               tax_table(as.matrix(tax_tab)), 
               sample_data(metadata %>% column_to_rownames("sample-id"))