joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
568 stars 187 forks source link

importing sample information #1314

Open ARBramucci opened 4 years ago

ARBramucci commented 4 years ago

Hello, I am new to exploring phyloseq, and I managed to get my ASV table from dada2 into phyloseq format. But I sort of had to force it in, I had to name my rows by hand as ASV names and then name my taxonomy table with the same ASV names and then I can't figure out how to get my samples into the same format. numbers is a unique ASV number added to my df code is the sample number abundance is abund

`df.subset <- df[c('numbers','code','abund')]

df.subset.spread <- df.subset %>% spread(key='code', value='abund')

rownames(df.subset.spread) <- paste0("asv", df.subset.spread$numbers)

ASV<-as.matrix(df.subset.spread)

ASV2<-ASV[1:nrow(ASV), 2:ncol(ASV)]`

`taxonomy.vars <- c('numbers','tax.Kingdom', 'tax.Phylum','tax.Class', 'tax.Order','tax.Family', 'tax.Genus','Species') df.taxonomy <- df[taxonomy.vars]

make your list distinct

df.taxonomy<- distinct(df.taxonomy)

rownames(df.taxonomy) <- paste0("asv", df.taxonomy$numbers)

Tax<-as.matrix(df.taxonomy) Tax2<-Tax[1:nrow(Tax), 2:ncol(Tax)]`

OTU = otu_table(ASV2, taxa_are_rows = TRUE) TAX = tax_table(Tax2)

everything above has worked and I can get figures working with the resulting phyloseq object, BUT I can't link up the Sample information, I have 72 unique sample "code" and those codes can be split into 4 columns with information about the samples as follows:"extraction","sample","volume","replicate".

sample.vars <- c('numbers',"code","extraction","sample","volume","replicate") LowInput.sample <- LowInput.context.FGID[sample.vars] asv.silva.df.context <- LowInput.taxonomy %>% left_join(LowInput.sample, by='numbers')

Sorry if this is a simple question or is covered somewhere else, I have tried to look for tutorials and couldn't figure out how to get my samples into that format...

sklucas commented 4 years ago

Hello, I am a longtime user of phyloseq and am grateful for its convenience and functionality over the years. I'd like to confirm that this was also a problem for me, and I figured out a solution. My code has worked for years up until yesterday. I assume it was because of a package update, and I had a hunch it was a Tidyverse problem. I could make a phyloseq object, but the phyloseq function would produce an error if I filled the sample_data slot with my usual table, which is imported using read_csv() (readr), and is manipulated using mostly dplyr functions. When I load my data with the code below using read.csv(), the phyloseq function works as expected.

samdf <- read.csv("../path/to/sampleData.csv")

rownames(samdf) <- samdf$SAMPLE_ID
ps <- phyloseq(otu_table(seqtab, taxa_are_rows=FALSE),
               sample_data(samdf), 
               tax_table(taxtab),
               phy_tree(fitGTR$tree)

If you want to use any sort of Tidyverse functions such as spread() on the sample data prior to making the phyloseq object, just make sure to convert it to a data.frame like below before creating the phyloseq object:

data.frame(samdf)
rownames(samdf) <- samdf$SAMPLE_ID

Package Versions currently in use:

package.version("tidyverse") [1] "1.3.0" package.version("phyloseq") [1] "1.28.0"

Hope this helps! Sarah