Closed Ge0rges closed 1 month ago
In this setting, I assume that the first matrix is your X
and the second is your Y
. What test are you running that produces this error? When I run the test of the strong null with this data, or the test of the weak null with j = 1
, I don't get an error.
Sorry should've mentioned that is running with strong. That's odd that doesn't get reproduced. Let me see if I can isolate the bug better.
Hi @svteichman the following matrices seem to do it pretty consistently on my end for the strong case:
X <- matrix(c(0, 1, 1, 0, 1, 0, 0, 1), nrow=4, ncol=2)
Y <- matrix(c(0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0), nrow=4, ncol=3)
The error may have to with numerical stability so it could be that this is not reproducible on your machine, in which case we can ignore it unless I come up with a way to reproduce it better as it doesn't really affect many samples in my dataset.
This does produce an error for me, and I agree there is numerical instability. This is common with separated data, which is why we added the penalty option to multinom_test()
. However, running multinom_test()
with your example with penalty = TRUE
leads to a different error, due to something internally that happens in cases when an associated information matrix is singular (which happens with small samples and sparsity). I'll work on a fix for this, but I'm not sure when all of the numerical issues will be cleared up, especially for these cases that are so unstable.
Thanks @svteichman I'm not too worried about future issues as I've run my entire dataset using the test and this is the only issue that comes up. When is it suitable to run with penalty=True
does it change the hypothesis tested or interpretation of results?
I actually just made a PR that should address the issues when penalty = TRUE
(I'll merge this once checks finish, in which case you can install the most recent version of the software). We suggest using penalty = TRUE
for small or sparse samples, where you'll likely encounter data separation which leads to infinite values of the likelihood and infinite MLE values. It does not change the hypothesis tested or interpretation of the results.
I just merged this, so if you use the penalty you shouldn't run into as many of these issues.
Thanks for the quick fix!
On some very sparse inputs as below I get the following error:
R[write to console]: Error in if (rel_diff < tol) { : missing value where TRUE/FALSE needed
This isn't a big issue as I can ignore this sample and others like it are sparse in my dataset, but I thought I would report it.