joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
567 stars 187 forks source link

Taxa names coming up as random words #1762

Open backwards-charm opened 1 week ago

backwards-charm commented 1 week ago

My code and output is as follows `> # Create a taxonomy table

taxmat = matrix(sample(words, 32, replace = TRUE), nrow = nrow(otumat), ncol = 6) rownames(taxmat) <- rownames(otumat) colnames(taxmat) <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus") taxmat Kingdom Phylum Class Order Family Genus
OTU1 "white" "weigh" "white" "weigh" "white" "weigh"
OTU2 "load" "twenty" "load" "twenty" "load" "twenty"
OTU3 "print" "cover" "print" "cover" "print" "cover"
OTU4 "why" "support" "why" "support" "why" "support" OTU5 "debate" "luck" "debate" "luck" "debate" "luck"
OTU6 "left" "room" "left" "room" "left" "room"
OTU7 "large" "already" "large" "already" "large" "already" OTU8 "hall" "thirteen" "hall" "thirteen" "hall" "thirteen" OTU9 "cause" "mister" "cause" "mister" "cause" "mister"
OTU10 "must" "guess" "must" "guess" "must" "guess"
OTU11 "function" "correct" "function" "correct" "function" "correct" OTU12 "of" "try" "of" "try" "of" "try"
OTU13 "realise" "scotland" "realise" "scotland" "realise" "scotland" OTU14 "quarter" "never" "quarter" "never" "quarter" "never"
OTU15 "sunday" "already" "sunday" "already" "sunday" "already" OTU16 "hand" "jesus" "hand" "jesus" "hand" "jesus" `

Why am I getting random words for my OTU table (such as "print", "why", "debate", "quarter") and not the actual names of the bacterial classes (such as "Gammaproteobacteria", "Bacilli", "Actinobacteria")? And how do I amend this issue?

benjjneb commented 1 week ago
taxmat = matrix(sample(words, 32, replace = TRUE), nrow = nrow(otumat), ncol = 6)

Because you are making a matrix of random words.

backwards-charm commented 1 week ago

How do I change it to the taxa names I have from dada2? I am new to R

benjjneb commented 1 week ago

Did you already make a taxonomy table in dada2? If so, just use that. If not, go to the assign taxonomy section of the dada2 tutorial and follow those instructions.

backwards-charm commented 1 week ago

I did, using this code

taxa <- assignTaxonomy(seqtab.nochim, "/Users/Desktop/silva_nr99_v138.1_train_set.fa.gz", multithread=TRUE)
taxa.print <- taxa      # Removing sequence rownames for display only
rownames(taxa.print) <- NULL
head(taxa.print)

Would I have to use the code from the green "Alternatives: IdTaxa" box instead? I don't see how my output from the original code would fit in with the otu table code to ultimately make the phyloseq object

benjjneb commented 1 week ago

If you continue following the dada2 tutorial, it includes the "handoff to phyloseq" section with code for creating the phyloseq object from the sequence table and the taxonomy table. Have you looked at that code?

backwards-charm commented 1 week ago

Yes, I have. Something about it isn't working out for me. I was told I could use a metadata file for sample naming but I couldn't figure out at which point throughout the process to add in the metadata so I went back and edited the file names to somewhat match the tutorial.

My file naming scheme is: a letter indicating treatment type (C for control, A for acids, etc.), a number indicating which replicate for the specific treatment type (1 or 2), the letter "D", and the day in which the sample was taken (0, 15, etc). So for example, one of my files is named: C1D0.

After doing this, I was able to generate some plots where "Day" is the focus using the following code:

# Create a new variable called samples.out from the rownames of seqtab.nochim
samples.out <- rownames(seqtab.nochim)
# Create a new variable called subject that is the part of samples.out before the first "D"
subject <- sapply(strsplit(samples.out, "D"), `[`, 1)
# Create a new variable called gender that is the first letter of subject
treatment <- substr(subject,1,1)
# Reassign subject to be the part of subject after the first character
subject <- substr(subject,2,999)
# Create a new variable called day that is the part of samples.out after the first "D"
day <- as.integer(sapply(strsplit(samples.out, "D"), `[`, 2))
# Create a new data frame called samdf with columns Subject, Gender and Day
samdf <- data.frame(Subject=subject, Treatment=treatment, Day=day)
# Create a new variable called When that is “Day 0" 
samdf$When <- "Day 0"
# Change the value of When to “Day 37" for all rows where Day is greater than 36
samdf$When[samdf$Day>36] <- "Day 37"
# Add the rownmanes of seqtab.nochim to samdf
rownames(samdf) <- samples.out

This looks great for Shannon-Simpson and Bray-Curtis but when I get to the bar graphs, they turn up looking a bit odd.

Originally I used the code:

# Create Bar plot
top20 <- names(sort(taxa_sums(ps), decreasing=TRUE))[1:20]
ps.top20 <- transform_sample_counts(ps, function(OTU) OTU/sum(OTU))
ps.top20 <- prune_taxa(top20, ps.top20)
plot_bar(ps.top20, x="Day", fill="Class") + facet_wrap(~When, scales="free_x")

and ended up with:

Screenshot 2024-06-26 at 4 06 37 PM

I also edited it to:

 # Create Bar plot
top1000 <- names(sort(taxa_sums(ps), decreasing=TRUE))[1:1000]
ps.top1000 <- transform_sample_counts(ps, function(OTU) OTU/sum(OTU))
ps.top1000 <- prune_taxa(top1000, ps.top1000)
plot_bar(ps.top1000, x="Day", fill="Class") + facet_wrap(~When, scales="free_x")

and the bar graphs look completely black rather than colorful:

Screenshot 2024-06-26 at 4 07 03 PM

I am also more interested in examining the differences between treatment type, rather than between days. I am not sure how to edit the code to show me differences in treatment, seeing as I have 8 different treatment types, other than doing the method with the otu and tax table using original file names rather than changing all my file names to a naming scheme

backwards-charm commented 1 week ago

With tis method, the plots look nice, despite being full of random words.

Tutorial from https://joey711.github.io/phyloseq/import-data.html#import_biom

# Create an OTU table
otumat = matrix(sample(1:100, 100, replace = TRUE), nrow = 16, ncol = 16)
otumat
rownames(otumat) <- paste0("OTU", 1:nrow(otumat))
colnames(otumat) <- paste0("Sample", 1:ncol(otumat))
otumat
# Create a taxonomy table
taxmat = matrix(taxa(words, 32, replace = TRUE), nrow = nrow(otumat), ncol = 6)
rownames(taxmat) <- rownames(otumat)
colnames(taxmat) <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus")
taxmat
class(otumat)
class(taxmat)
# Combine into a phyloseq object
library(phyloseq)
OTU = otu_table(otumat, taxa_are_rows = TRUE)
TAX = tax_table(taxmat)
OTU
TAX
physeq = phyloseq(OTU, TAX)
physeq
plot_bar(physeq, fill = "Class")
Screenshot 2024-06-26 at 4 10 54 PM