joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
577 stars 187 forks source link

Extract OTU/ASVs from distance matrix #937

Open Biancabrown opened 6 years ago

Biancabrown commented 6 years ago

Hi,

Thanks for the for program it's been great so far. I want to find out which OTU/ASVs are responsible for the patterns I see in my NMDS plots. For example if I see that species A clusters different from species B what OTUs are responsible for that clustering? Is there a way for me to extract this information from the distance file?

spholmes commented 6 years ago

Bianca This is not a documented method, I am giving you some hints from some of the projects we have done (consider this to be more of a hack than a simple solution). It is a problem in nmds compared to direct correspondence analysis for instance that provides biplots. So step one redo the analyses with the option CCA in ordinate and look at the biplot (we did this in our Marine mammals paper and it worked nicely). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4742810/ Now, if that doesn't help at all and you really really want to use NMDS, what we usually do is take the most extreme sample points in the nmds on each of the axes and look at their contrasts in terms of taxa/ASV. There is a stackoverflow question on this question here: https://stats.stackexchange.com/questions/144593/interpreting-nmds-ordinations-that-show-both-samples-and-species/144602

The idea being you can make fake profiles with signature amounts of different taxa and construct these as supplementary points (fake samples) with profiles involving these ASV/taxa and project them using weighted versions of NMDS with weights equal to zero, then they appear on the plot and indicate where high values of the ASVs would go.

Again, your mileage may vary as this is not a documented method, hope this helps.

Best of luck Susan

On Fri, May 18, 2018 at 8:08 AM, Biancabrown notifications@github.com wrote:

Hi,

Thanks for the for program it's been great so far. I want to find out which OTU/ASVs are responsible for the patterns I see in my NMDS plots. For example if I see that species A clusters different from species B what OTUs are responsible for that clustering? Is there a way for me to extract this information from the distance file?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/937, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvVvFkfZ3OC3ZV0YX0fCuIMznx4Swks5tzuPRgaJpZM4UE5eA .

-- Susan Holmes John Henry Samter Fellow in Undergraduate Education Professor, Statistics 2017-2018 CASBS Fellow, Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

spholmes commented 6 years ago

Bianca Rereading your comment more closely, in your case, if you just want to see what differentiates two well defined clusters, label them add that label as a new variable in the sample information, then do a discriminant analysis on those labels and it will pull out which taxa are responsible. Use a supervised learning approach as in the section thus named in https://f1000research.com/articles/5-1492/v2 and you can pull out the ASVs, sorry I didn't read your note carefully enough the first time, Susan

On Fri, May 18, 2018 at 8:34 AM, Susan Holmes sp.holmes@gmail.com wrote:

Bianca This is not a documented method, I am giving you some hints from some of the projects we have done (consider this to be more of a hack than a simple solution). It is a problem in nmds compared to direct correspondence analysis for instance that provides biplots. So step one redo the analyses with the option CCA in ordinate and look at the biplot (we did this in our Marine mammals paper and it worked nicely). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4742810/ Now, if that doesn't help at all and you really really want to use NMDS, what we usually do is take the most extreme sample points in the nmds on each of the axes and look at their contrasts in terms of taxa/ASV. There is a stackoverflow question on this question here: https://stats.stackexchange.com/questions/144593/ interpreting-nmds-ordinations-that-show-both-samples-and-species/144602

The idea being you can make fake profiles with signature amounts of different taxa and construct these as supplementary points (fake samples) with profiles involving these ASV/taxa and project them using weighted versions of NMDS with weights equal to zero, then they appear on the plot and indicate where high values of the ASVs would go.

Again, your mileage may vary as this is not a documented method, hope this helps.

Best of luck Susan

On Fri, May 18, 2018 at 8:08 AM, Biancabrown notifications@github.com wrote:

Hi,

Thanks for the for program it's been great so far. I want to find out which OTU/ASVs are responsible for the patterns I see in my NMDS plots. For example if I see that species A clusters different from species B what OTUs are responsible for that clustering? Is there a way for me to extract this information from the distance file?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/937, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvVvFkfZ3OC3ZV0YX0fCuIMznx4Swks5tzuPRgaJpZM4UE5eA .

-- Susan Holmes John Henry Samter Fellow in Undergraduate Education Professor, Statistics 2017-2018 CASBS Fellow, Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

-- Susan Holmes John Henry Samter Fellow in Undergraduate Education Professor, Statistics 2017-2018 CASBS Fellow, Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

spholmes commented 6 years ago

Bianca Rereading your comment more closely, in your case, if you just want to see what differentiates two well defined clusters, label them add that label as a new variable in the sample information, then do a discriminant analysis on those labels and it will pull out which taxa are responsible. Use a supervised learning approach as in the section thus named in https://f1000research.com/articles/5-1492/v2 and you can pull out the ASVs, sorry I didn't read your note carefully enough the first time, Susan

On Fri, May 18, 2018 at 8:08 AM, Biancabrown notifications@github.com wrote:

Hi,

Thanks for the for program it's been great so far. I want to find out which OTU/ASVs are responsible for the patterns I see in my NMDS plots. For example if I see that species A clusters different from species B what OTUs are responsible for that clustering? Is there a way for me to extract this information from the distance file?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/937, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvVvFkfZ3OC3ZV0YX0fCuIMznx4Swks5tzuPRgaJpZM4UE5eA .

-- Susan Holmes John Henry Samter Fellow in Undergraduate Education Professor, Statistics 2017-2018 CASBS Fellow, Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

Biancabrown commented 6 years ago

Hi Susan,

Thank you. The second response was what I was looking for.

-Bianca