Closed kamoors closed 1 year ago
Additionally, if I want to do bootstrapping with my glm output, will the weights be a problem since not all samples from each subclass will be chosen?
This is primarily a statistical consulting question and not really about using MatchIt
. I recommend you seek the services of a statistical consultant with experience in causal inference. I'll offer a few thoughts here, but I don't have the bandwidth to fully advise on your methodology, and to provide such advice would require a much longer conversation about your estimand, the meaning of your variables, the goals of the study, etc.
A few things that strike me as odd:
nutrient
in the matching?match.data()
?matchit()
?glm()
and not lm()
if you are fitting a linear model? If you are not fitting a linear model, why don't you have family
specified?subclass
as a predictor instead of using it to adjust the standard errors?My advice is to follow the advice that already exists. The vignettes provide very clear instructions on how to estimate treatment effects using best practices. Why do you want to deviate from these instructions? I recommend seeking a statistical consultant who can help you answer these questions if you don't know the answers to them. Matching is an advanced statistical method that requires expertise to do well, especially if you want to deviate from known best practices.
Thank you for answering so quickly! I think the questions you ask offer food for thought. Let me try to answert them so that maybe I can distill a solution to my problem (I'll answer in the quote below).
Just a quick note on the research. From the nutritional information, I have hundreds of variables, and I am only interested in the interaction of the drug (0 or 1) with this specific nutrient.
I have many different data types, but the overall question is what this interaction does to the gut microbiome (and subsequently the host). We are employing constraint-based modelling, community modelling, and metabolomics to answer this question.
This is primarily a statistical consulting question and not really about using
MatchIt
. I recommend you seek the services of a statistical consultant with experience in causal inference. I'll offer a few thoughts here, but I don't have the bandwidth to fully advise on your methodology, and to provide such advice would require a much longer conversation about your estimand, the meaning of your variables, the goals of the study, etc.A few things that strike me as odd:
- Why aren't you balancing on
nutrient
in the matching? I was considering doing that but the idea behind matching is to equalize the ''other'' variables, right? If I include my nutrient of interest, wouldn't that nullify the differences?- Why are you settling for okay balance instead of excellent balance?
- Why didn't you extract the matched dataset using
match.data()
? The question only contains a small section of my code. I followed the vignette and tested multiple settings of matchit(). So, yes, I do use the output of match.data()"- Why didn't you include the matching weights in the outcome model? I use the match.data() ''weights'' column for the weights parameter of the glm
- If you arr using sampling weights, why didn't you include those in the call to
matchit()
- Why are you using
glm()
and notlm()
if you are fitting a linear model? If you are not fitting a linear model, why don't you havefamily
specified? I have community modelling data where the residuals are distributed non-normally in various cases. This is just a carry-over from those analyses (since glm with 'gaussian' is just an lm)- Why are you including
subclass
as a predictor instead of using it to adjust the standard errors? This was the thing that I was very unsure about. I realize that doing so might introduce colinearity, that's why I'm asking the questions here :)- How are you computing the treatment effect from the model? The idea was to fit a model per metabolite / exchanged metabolite (from the metabolic modelling) to establish if the use of the drug in combination with the nutrient have a specific effect on that metabolite / flux / bacterial species.
- If you are bootstrapping, follow the bootstrapping instructions in the vignette on estimating effects. You need to bootstrap subclass, not individuals. Why do you want to bootstrap? I found these instructions after posting. Thanks!
My advice is to follow the advice that already exists. The vignettes provide very clear instructions on how to estimate treatment effects using best practices. Why do you want to deviate from these instructions? I recommend seeking a statistical consultant who can help you answer these questions if you don't know the answers to them. Matching is an advanced statistical method that requires expertise to do well, especially if you want to deviate from known best practices.
Some responses:
nutrient
is meant to nullify the association between nutrient
and drug
, but that does not affect the relationships between nutrient
and metabolite
or between drug
and metabolite
, which are what you are studying. The estimating effects vignette provides instructions for moderation analysis, and balancing on the moderator is important. Ideally, you have balance within each level of the moderator (i.e., at each level of the moderator, treatment is independent of covariates). This usually involves including interactions between the moderator and the covariates.subclass
as a predictor doesn't (just) introduce colinearity, but it fails to preserve the estimand and can limit the degrees of freedom for your estimates, which makes it harder to detect effects. Where did you see this method used? It is not a standard practice (though in some cases it can be equivalent to using a cluster-robust SE).The general issue that I had with this matching procedure is that I am basically looking for an effect of a combined variable, where drug use is 0 or 1, but the nutrient is a continuous variable. Since in matchit, you are looking at the treatment effect (which comes from 0 or 1), I wonder if I can even use this matching procedure for my purpose.. Plus, this is all existing data that was created with a different purpose, so I have to make do with what I have...
Regarding including subclass, my rationale for including it was that there might be some specific subclass effects that only occur within that subclass. I thought that by including them I would diminish those effects, but clearly not..
Overall, the effect of the drug separately and the nutrient separately have been studeide before, but there is evidence of specific interaction effect under certain cases. I want to know if we can also see this effect in the general population. So, we want to find observations that might affected by the interaction..
It is possible to study the effect of two combined treatments, but you can't use matching for that. One approach is described in Vandrweele (2009). This is a very advanced problem and there has not been much work done on it. I would recommend collaborating with a methodologist who has expertise in causal inference to do this research.
Ok, thank you for all your help!
Dear sir/madam,
I am investigating the combined effect of a drug and one nutrient in a cohort of 1280 people. However, I only have 77 patients / samples using the drug. I wanted to match these patients based on BMI, age, Gender, and general food intake. I wrote it like this:
m.out1 <- matchit(drug ~ BMI_mb + GENDER + AGE + food, data = data, method = "full", distance = "glm")
The balance seems ok.
Can I then go into my metabolomics data and run a glm like this?
glm("*metabolite ~ drug nutrient + BMI_mb + GENDER + AGE + food + batch + subclass**", data = test_df,weights = mets_samples$weights)
Or am I only ''allowed'' to use the matched data with the functions described in the vignette?