Closed sterrettJD closed 11 months ago
Thanks for this contribution @sterrettJD !
I have a few high-level comments first, and can take a more detailed review on the second pass:
anova
action in q2-longitudinal? Or at least make the API more consistent? (e.g., to use a proper formula structure instead of accepting a str
of comma-delimited predictors). My preference would be to make RM an option within the existing action, and this would also limit code replication. On the other hand I see that some options are unique to each method so a distinct action might be appropriate, in which case both anova
and anova_rm
should call the same base function, and only the relevant options are exposed in each action. What do you think?table
input can be removed... metadata-transformable objects (including feature tables!) can be merged in Q2 to extract target data, e.g., if running anova with an ASV as the dependent variable. You were probably following the LME API as a guideline here, but that API pre-dates the addition of a feature table to metadata transformer, and should also be updated at some point. You can just drop this.Let me know what you think and we can discuss next steps... Thanks!
- would it be possible to make RM an option in the existing anova action in q2-longitudinal? Or at least make the API more consistent? (e.g., to use a proper formula structure instead of accepting a str of comma-delimited predictors). My preference would be to make RM an option within the existing action, and this would also limit code replication. On the other hand I see that some options are unique to each method so a distinct action might be appropriate, in which case both anova and anova_rm should call the same base function, and only the relevant options are exposed in each action. What do you think?
Sure thing! I think it would be feasible to expose it within the existing action, so I'll start with that. Once I make some progress, I may check back in to get your thoughts on if it seems more practical to split to a separate function.
- similarly, the visualizer does not need a distinct function, since it is mostly code replication. The visualize_anova function should be adjusted instead to enable/disable different features depending on the input (e.g., if residuals data are present or not).
Makes sense!
- metadata-transformable objects (including feature tables!) can be merged in Q2 to extract target data, e.g., if running anova with an ASV as the dependent variable. You were probably following the LME API as a guideline here, but that API pre-dates the addition of a feature table to metadata transformer, and should also be updated at some point. You can just drop this.
Cool! You're right, I was following the LME API, but that makes a lot of sense. I had been wondering why passing alpha diversity vectors to Metadata for LME worked but passing them as the input table did not.
Thanks for the helpful response and being willing to work with me on this, @nbokulich ! To let you know my timeline, I'm pretty booked through June 15, maybe later, but I'll try to get working on this by July and will check in if I have questions. I'll also address all of the lint issues!
Thanks @sterrettJD ! Just a quick comment on the timeline — that sounds totally fine from my end. The next QIIME 2 release is in August, so having this PR finalized by late July would be a good target if you want this in the next release.
Hi @sterrettJD ! Any updates on this PR? Let us know if you need a hand with anything.
Hi @nbokulich sorry about the delay - got wrapped up in summer projects and classes. I'll try to get back on this over the next 2 weeks! If it looks like that isn't reasonable I'll let you know. Thanks for following up!
Hi @sterrettJD , that timeline sounds fine, thanks! just checking in
Hi @nbokulich just wanted to touch base, I'm working on this now and will be doing more this week. Sorry about the delays!
Hi @nbokulich and @thermokarst , I'm working on implementing repeated measures as part of the base anova implementation. To keep the the API consistent (and allow users to provide a formula), I've implement a linear mixed effects model with random effects for the participant ID, which is passed to statsmodels' anova_lm
(like the non-rm version, which is an OLS model passed to anova_lm
).
However, anova_lm
can't accept the mixed effects model results class, so I was wondering what your thoughts are regarding implementing this via statsmodels' AnovaRM
function? The downside to AnovaRM
is that it doesn't accept a formula, and users couldn't specify interactions.
I've looked into extracting model information to perform an ANOVA without statsmodels, but I worry about the robustness of this approach. Do you have any thoughts on what the best route forward might be? Would it be better to try to extract the model information to perform an ANOVA without statsmodels, or would using the existing AnovaRM
function be better? Or would there be another option that I'm not considering? Thanks!
Hi @sterrettJD , Thanks for the update! My thoughts:
AnovaRM
is the correct way to do this in statsmodels.patsy
could perhaps be used here to parse a formula input, and pass it to AnovaRM
as a list of factors.AnovaRM
actually tests all interactions when multiple factors are passed in the list. This is not ideal either, but this also appears to be the case with the equivalent R function anova_test
for repeated models. I suggest we start here, and any enhancements could be easier to layer on later.I would avoid doing this without statsmodels — I agree, it does not sound like a robust approach.
Hi all, I've updated the anova function as discussed previously.
anova()
now calls statsmodels AnovaRM if repeated_measures == True
However, I don't yet have unit tests for a 2x2 repeated measures Anova. The function requires fully balanced groups, and the dataset I've used so far for unit tests doesn't have balanced groups for any 2x2 design, so I'll have to find/make one.
Additionally, I haven't implemented any post-hocs yet. Do you think that a paired samples t-test would be generalizable and appropriate? I wanted to check before implementing.
If you have any feedback on the implementation so far, I can update to address that. I'll also address the lint issues.
Hey @sterrettJD, thanks for all of your hard work on this! We are currently doing some PR triage and review - are you still in the process of going through the review comments from @nbokulich?
Hey @lizgehret, I incorporated some but have unfortunately gotten bogged down prepping for my comps this semester. I'm sure I'd be able to get to it some point after November, but I don't think I'll be able to work on it before then... If anybody else wanted to work on it before then to get it ready for the next release, that's totally okay with me. Thanks for checking in!
@nbokulich, do you think this would be good to merge? It seems basically fine. Re: typemap, we could use it here, but we can throw that on after the fact.
hey @ebolyen sure, that sounds like a good compromise. I have given this some testing this morning and this looks ready to merge.
@sterrettJD thanks for the hard work on this! Very pleased to have this in q2-longitudinal and I look forward to using it myself! If you find some time to further refine this action per our discussion above, please open a new PR.
What's included and why?
This includes the addition of a repeated measures ANOVA to q2-longitudinal! I'm contributing this per @nbokulich's suggestion on this QIIME2 forum post.
A summary of changes:
A note: this is my first contribution to the project, and walking through the developer docs (multiple times) has been a great learning experience. That being said, I'm sure there must be conventions that I've missed, and I'm more than happy to address those if you come across them! I haven't yet included tests yet, but I was hoping to get feedback on what I have in the meantime.
Thanks! :smile: