joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
569 stars 187 forks source link

how to find ASVs present in different "sites" #1657

Open callmcgovern opened 1 year ago

callmcgovern commented 1 year ago

Hello, I am trying to look at the ASVs that are similar between different sites. I have sites in my sample_data and I have been able to make a venn diagram of overlapping vs non-overlapping ASVs in different sites. However, I would now like to look at a list of those ASVs which are shared by two particular samples. Does anyone know how I could do this? Thanks!

jeremysutherland commented 1 year ago
> ps.rare
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 2864 taxa and 893 samples ]
sample_data() Sample Data:       [ 893 samples by 10 sample variables ]
tax_table()   Taxonomy Table:    [ 2864 taxa by 7 taxonomic ranks ]
> 
> ps.NY <- subset_samples(ps.rare, site == "NY")
> ps.NY<- prune_taxa(taxa_sums(ps.NY) > 0, ps.NY) # prune OTUs that are not present in at least one sample
> 
> ps.NJ <- subset_samples(ps.rare, site == "NJ")
> ps.NJ <- prune_taxa(taxa_sums(ps.NJ) > 0, ps.NJ) # prune OTUs that are not present in at least one sample
> 
> ps.NY
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 951 taxa and 349 samples ]
sample_data() Sample Data:       [ 349 samples by 10 sample variables ]
tax_table()   Taxonomy Table:    [ 951 taxa by 7 taxonomic ranks ]
> ps.NJ
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 1141 taxa and 204 samples ]
sample_data() Sample Data:       [ 204 samples by 10 sample variables ]
tax_table()   Taxonomy Table:    [ 1141 taxa by 7 taxonomic ranks ]
> 
> length(taxa_names(ps.NY))
[1] 951
> length(taxa_names(ps.NJ))
[1] 1141
> length(intersect(taxa_names(ps.NY), taxa_names(ps.NJ)))
[1] 317
>write.csv(intersect(taxa_names(ps.NY), taxa_names(ps.NJ)), "shared_asvs_NY_NJ.csv")
callmcgovern commented 1 year ago

Hi, thank you for your help! How would you do this if you wanted to find ASVs that are found in all of your samples in the phyloseq object?

jeremysutherland commented 1 year ago

nrow(ps@otu_table) ps.prune <- prune_taxa(taxa_sums(ps) > 0, ps) # prune ASVs that are not present in at least one sample nrow(ps.prune@otu_table)

callmcgovern commented 1 year ago

Hi Jeremy, thanks for your quick reply. This does not seem to work. Here is the code and output on my end:

nrow(ps_dolphin@otu_table) [1] 15

ps_dolphin phyloseq-class experiment-level object otu_table() OTU Table: [ 1899 taxa and 15 samples ] sample_data() Sample Data: [ 15 samples by 4 sample variables ] tax_table() Taxonomy Table: [ 1899 taxa by 7 taxonomic ranks ] refseq() DNAStringSet: [ 1899 reference sequences ]

ps.prune <- prune_taxa(taxa_sums(ps_dolphin) > 0, ps_dolphin) nrow(ps.prune@otu_table) [1] 15

ps.prune phyloseq-class experiment-level object otu_table() OTU Table: [ 1899 taxa and 15 samples ] sample_data() Sample Data: [ 15 samples by 4 sample variables ] tax_table() Taxonomy Table: [ 1899 taxa by 7 taxonomic ranks ] refseq() DNAStringSet: [ 1899 reference sequences ]

Any ideas as to why it wouldnt be working?

kodelogue commented 1 year ago

Could be that you don't have any ASVs with all zero counts. (All ASVs have at least 1 count in at least one sample).

This should tell you if thats the case. (Assuming samples are rownames)

sum(colSums(ps_dolphin@otu_table)<1)

jeremysutherland commented 1 year ago

ope. Kodelogue is my other (fun) account. but that should do the trick.

mailasb commented 5 months ago

Hi, I'm new to Github. I found this answer that was very helpful for my project:

> ps.rare
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 2864 taxa and 893 samples ]
sample_data() Sample Data:       [ 893 samples by 10 sample variables ]
tax_table()   Taxonomy Table:    [ 2864 taxa by 7 taxonomic ranks ]
> 
> ps.NY <- subset_samples(ps.rare, site == "NY")
> ps.NY<- prune_taxa(taxa_sums(ps.NY) > 0, ps.NY) # prune OTUs that are not present in at least one sample
> 
> ps.NJ <- subset_samples(ps.rare, site == "NJ")
> ps.NJ <- prune_taxa(taxa_sums(ps.NJ) > 0, ps.NJ) # prune OTUs that are not present in at least one sample
> 
> ps.NY
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 951 taxa and 349 samples ]
sample_data() Sample Data:       [ 349 samples by 10 sample variables ]
tax_table()   Taxonomy Table:    [ 951 taxa by 7 taxonomic ranks ]
> ps.NJ
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 1141 taxa and 204 samples ]
sample_data() Sample Data:       [ 204 samples by 10 sample variables ]
tax_table()   Taxonomy Table:    [ 1141 taxa by 7 taxonomic ranks ]
> 
> length(taxa_names(ps.NY))
[1] 951
> length(taxa_names(ps.NJ))
[1] 1141
> length(intersect(taxa_names(ps.NY), taxa_names(ps.NJ)))
[1] 317
>write.csv(intersect(taxa_names(ps.NY), taxa_names(ps.NJ)), "shared_asvs_NY_NJ.csv")

I wanted to ask: How could I obtain that same list of shared ASVs between both groups but also associated with the taxonomic classification found in the tax_table? In other words, I want to obtain a table (like the one in the attached image) that shows each shared ASV (for example: ASV1, ASV5, ASV30, etc.) and what microorganism it represents. Thank you very much. Bacteria

jeremysutherland commented 5 months ago

I think* this should work:

asv_list <- intersect(taxa_names(ps.NY), taxa_names(ps.NJ))

taxa_sub <- ps.rare@tax_table[rownames(ps.rare@tax_table) %in% asv_list, ]

mailasb commented 4 months ago

It worked perfectly. Thank you very much!