joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
567 stars 187 forks source link

Run the Wilcoxon test for each of the taxa ranks #1745

Open mailasb opened 2 months ago

mailasb commented 2 months ago

Hi, I'm comparing a set of biological samples that are classified into two groups of interest called "Status" (1 and 0). I had the following phyloseq object:

mirl_phyloseq_final phyloseq-class experiment-level object otu_table() OTU Table: [ 8229 taxa and 42 samples ] sample_data() Sample Data: [ 42 samples by 6 sample variables ] tax_table() Taxonomy Table: [ 8229 taxa by 6 taxonomic ranks ]

Then I agglomerated it by family: mirl_phyloseq_final.family=tax_glom (mirl_phyloseq_final, taxrank="Family")

mirl_phyloseq_final.family phyloseq-class experiment-level object otu_table() OTU Table: [ 364 taxa and 42 samples ] sample_data() Sample Data: [ 42 samples by 6 sample variables ] tax_table() Taxonomy Table: [ 364 taxa by 6 taxonomic ranks ]

And then merged it with psmelt: melted.f=psmelt (mirl_phyloseq_final.family) So, I got a data.frame (melted.f) with the samples, the found ASVs, the abundance of each one, and the taxonomic classification agglomerated by family that looks like this: abundance

If I want to calculate the differential abundance between the two groups and run Wilcoxon for a family, I do the following: wilcox.test (Abundance~Status, data=melted.f[which(melted.f$Family=="Mycoplasmataceae"),]). And I obtain a p-value. My intention is to run the Wilcoxon test to calculate differential abundance between the two groups of samples, for all families (in a loop or something like) this and then store those p-values in a data frame with the name of each family.

Any idea how I can do it?