joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
582 stars 187 forks source link

adjusting for confounders #486

Open jrsavage opened 9 years ago

jrsavage commented 9 years ago

Hi, I'm interested in using the using the microbiome to predict disease outcomes, and I need to adjust for confounders (e.g. race/ethnicity). I understand that phyloseq/DESeq2 fits the following model: microbiome=predictor 1 + predictor 2 in the line diagdds=phyloseq_to_deseq2(data~predictor1+predictor2)

Is there a way to fit the following model:

outcome=microbiome+predictor1+predictor2

Thank you

joey711 commented 9 years ago

Good question!

There are no off-the-shelf tools for this specific to microbiome data. It is something I am looking into. If you are already doing some work on this, I'd be happy to discuss further not on the issue tracker.

Of course if you find an already-developed tool for this, would love to have it posted here, as well.

Cheers

joey

jrsavage commented 9 years ago

Hi Joey, Thanks for your email. If you have any thoughts I would love to discuss. I am an allergist/epidemiologist and not much of a programmer so getting this far has been an achievement-this is definitely not something I will find a solution to on my own but will be helpful to those of us interested in how the microbiome influences disease. Jessica


From: Paul J. McMurdie [notifications@github.com] Sent: Wednesday, June 10, 2015 5:13 PM To: joey711/phyloseq Cc: Savage, Jessica R.,M.D.,M.H.S. Subject: Re: [phyloseq] adjusting for confounders (#486)

Good question!

There are no off-the-shelf tools for this specific to microbiome data. It is something I am looking into. If you are already doing some work on this, I'd be happy to discuss further not on the issue tracker.

Of course if you find an already-developed tool for this, would love to have it posted here, as well.

Cheers

joey

— Reply to this email directly or view it on GitHubhttps://github.com/joey711/phyloseq/issues/486#issuecomment-110916125.

The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.

michberr commented 9 years ago

Hi Jessica,

If you're looking to make a model that will predict disease outcome, you might consider a random forest or another decision tree algorithm. I have a little demo with phyloseq formatted data posted here: http://rpubs.com/michberr/randomforestmicrobe. You could include both your microbiome data and other confounding data as your predictors and obviously the disease state would be your outcome. Decision trees are hierarchical so they inherently assume there's some degree of interaction between your variables. The output will give you an estimate of the model's error rate for predicting your outcome variable. It will also tell you which variables were most important for constructing the model. This might give you some good candidate OTUs for further exploration.

The downside of these models are that they're something of a black box . Based on your needs, they might not give you the level of interpretability that you want. However, the upside is that they're very easy to implement, and you can use any assortment of discrete or continuous variables as your predictors or outcome.

I would be interested to hear other people's ideas on this topic.

joaosabino commented 9 years ago

Hello,

I am using the package RWeka for Decision Tree making. Here you can introduce the OTU’s together with predictors (ex. Medication, Age, etc). I am using R for only 1 year and Weka for 1 week, so if I am saying something wrong please correct me.

Here is some code:

library(RWeka) library(party) library(partykit) library(Fselector)

Attach sample data to environment

metaphy <- sample_data(phyloseq_object) attach(metaphy)

Make decision tree with Diagnose as outcome of interest

DecisionTree <- J48(Diagnose ~.,  data = otu_table(phyloseq_object))
DecisionTree
summary(DecisionTree)
if(require("party", quietly = TRUE)) plot(DecisionTree)

Making decision tree with the most significant genera

DecisionTree8genera <- J48(Diagnose ~ Anaerostipes + genera2 + genera3 + … , data = otu_table(phyloseq_object)) DecisionTree8genera summary(DecisionTree8genera) plot(DecisionTree8genera)

Adding Age

DecisionTree8generaAge <- J48(Diagnose ~ Age + Anaerostipes + genera2 + genera3 + … , data = otu_table(phyloseq_object)) DecisionTree8generaAge summary(DecisionTree8generaAge) plot(DecisionTree8generaAge)

If you want to make a decision tree based on the whole OTU table + some predictors, you can make an dataframe with the OTU table and the predictors.

All OTU table + predictors

predictor1 <- metaphy$Age predictor2<- metaphy$Sex …

df1 <- data.frame(predictor1, predictor2, … , otu_table(phyloseq_object)) head(df1)

DecisionTree <- J48(Diagnose ~. , data = df1) DecisionTree summary(DecisionTree) plot(DecisionTree)

I am interested in your opinion about this approach to the problem.

Grts, João