Closed ianartmor closed 4 years ago
Hi @ianartmor!
Copying from my email for others' reference.
In terms of outliers, I wouldn't worry too much about them. Corncob is designed specifically to handle highly dispersed count data. It doesn't handle outliers in any specific way, nor do I think that it should, frankly. I don't think any points will have such high leverage so as to substantially change results, depending on your sample size, so I'm glad your exploration of that confirmed that. In general, I recommend against removing outliers, unless you have reason to believe they are invalid. Most people just do it to p-hack usually :). Ultimately the choice is yours though! My main point is corncob can handle them fine.
5 per group is fine! 0 is not. However, you don't have interaction effects, so you actually only need to worry about the marginal groupings, not the combinations. One thing to be aware of is that your estimates might have high standard errors and low power, so you might want to look into that. Mathematically, as long as you have some observations per group, and in your case, not even the group interactions, it should be just fine.
corncob will automatically filter for mathematically impossible models to fit as described in point two, so you don't need to worry about filtering. (see the filter_discriminant
parameter in differentialTest
). In terms of sanity checks, I highly recommend plots of a few individual model fits. Check some highly significant ones, and some random ones, fit a single model using bbdml
or just extract them from differentialTest
with full_output = TRUE
, and type plot(YOUR_OUTPUT)
. Visual investigation is the best sanity check in my opinion.
Cheers, Bryan
Hi there,
Thanks for all the work put into the package; it's awesome! It's great that it now supports contrasts and I'm looking forward to coefficient extraction as my current solution (a map pipeline) is pretty ugly.
My questions are mostly due to my stats ignorance. I think this is a similar topic to #72; I have a couple questions about best practices.
1) Do you have any recommendations as to what to do with outliers / how robust corncob is to them? For example, I have a taxon whose "relative abundance" is about 6% in one sample but >90% in other samples. Removing outlier samples changed results slightly, but not drastically, in terms of which taxa came up as differentially abundant.
2) I have an unbalanced dataset with 28 samples and I want to see how mean abundances covary with a chemical characteristic while controlling for annual and regional effects (2 years, 5 regions) on mean abundance and dispersion. There are 0-5 observations for each year/region combination. Is this something I can actually test or should I simplify my models? Alternatively, is there some diagnostic I can look at to see if I should simplify my models? Finally, what test/bootstrap would you recommend for a dataset like this (I can provide more details if necessary)? Here are my current differentialTest formulae:
3) Do you have recommendations for sanity checks / assumptions to test / filtering before running corncob? Or is it pretty much plug-and-play?
Again, thanks so much and apologies if the answers to these questions should be obvious.
Best, Ian