Open bellasmith121995 opened 2 years ago
The package vignette browseVignettes(package="DirichletMultinomial")
has for instance
fl <- system.file(package="DirichletMultinomial", "extdata", "Twins.csv")
count <- t(as.matrix(read.csv(fl, row.names=1)))
dmn(count, 2, verbose = TRUE)
with output
dmn, k=2
Soft kmeans
iteration 10 change 0.000472
iteration 20 change 0.000024
iteration 30 change 0.000001
Expectation Maximization setup
Expectation Maximization
iteration 10 change 0.068541
iteration 20 change 0.000015
Hessian
class: DMN
k: 2
samples x taxa: 278 x 130
Laplace: 38872.71 BIC: 39588.93 AIC: 39115.53
Is this what you have? If so it is some unique property of your data. It might be the large number of samples so perhaps subset or collapse, or collinearity of the count matrix, or...? Maybe some basic diagnostics would help, e.g., hist(log10(count))
especially in comparison to the sample data set, or subsampling rows / columns to a smaller matrix to investigate... I'm not really sure what to suggest without your count matrix. If you'd like to share that with me I could take a further look...
Hello Martin,
I am using this package to assign enterotypes based on a gene count table (samples x taxa=1982 x 148) and ran into the same issue.
Following your suggestion, I randomly subsampled 300 rows from the given df, with missing values and constant columns removed (as was done for the total population). The subset of gene counts follows a extremely right-skewed distribution, which is similar to that of the example data you provided.
My output seems to be aligned with your example output, except that there was nothing generated for Laplace, AIC and BIC at the end of k=2, 3, 4, etc.
Then by calling lplc <- base::sapply(fit, DirichletMultinomial::laplace)
(also for aic
and bic
), it returned NaN
for all of them.
Thank you if you could please advise on it!
p.s., Just got a bit confused because it used to work well for another dataset at my hand (dim: ~800x200, passing all the checks suggested above)..
Can you provide a (sample) of your data that reproduces the problem?
Hello, I am following the DMM tutorial here: https://microbiome.github.io/tutorials/DMM.html
I keep getting
I make there are no columns or rows that only have zeros and no matter what I get this result. What could be causing this? I also have tried with a small pseudocount and got the same results.