Closed emmansh closed 1 year ago
Hi emmansh,
Indeed you can still use infer
if you're starting with a contingency table, but there are some caveats. Let me weigh in on your two limitations.
I think that depends on how you go about untabling. My inclination would be to try pivot_longer()
then uncount()
. I'm pretty sure they constitute reliable inverse operations to `table().
Uf, I think here you're running into a fundamental limitation of the way infer
works right now. It's built so that it's data frame in, data frame out. That means that you'll need to process that table into a data frame before sending through an infer pipelines. It also means that the output of the generate()
function can be a very large data frame (it has the number of rows in the original data frame * reps). There are benefits to this approach - it allows for inspection of those data frames generated under the null - but there are costs in terms of performance. We had at one point discussed adding an option that would do the simulation through an efficient iteration process, bypassing the big data frame, but haven't done that yet (to my knowledge).
This might be a place where chisq.test()
makes more sense. It permits tabular inputs and defaults to using the asymptotic chi-square distribution of the test statistic, which should be a very good approximation if your counts are very large.
Thanks for the issue, @emmansh! My responses would be the same as Andrew's. Will go ahead and close this issue, though feel free to holler if you feel your questions are unanswered and we can reopen.🙂
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
This has been cross-posted to the Posit Community forum. Unless I'm missing something fundamental in the way
{infer}
works, I'd like to suggest a feature for supporting contingency tables when using{infer}
. Much likechisq.test()
, which accepts a contingency table as input.