mtmorgan / DirichletMultinomial

11 stars 7 forks source link

Keep getting NAs #6

Open bellasmith121995 opened 2 years ago

bellasmith121995 commented 2 years ago

Hello, I am following the DMM tutorial here: https://microbiome.github.io/tutorials/DMM.html

I keep getting

[[1]]
class: DMN 
k: 1 
samples x taxa: 7225 x 61 
Laplace: NaN BIC: NaN AIC: NaN 

[[2]]
class: DMN 
k: 2 
samples x taxa: 7225 x 61 
Laplace: NaN BIC: NaN AIC: NaN 

[[3]]
class: DMN 
k: 3 
samples x taxa: 7225 x 61 
Laplace: NaN BIC: NaN AIC: NaN 

[[4]]
class: DMN 
k: 4 
samples x taxa: 7225 x 61 
Laplace: NaN BIC: NaN AIC: NaN 

[[5]]
class: DMN 
k: 5 
samples x taxa: 7225 x 61 
Laplace: NaN BIC: NaN AIC: NaN 

[[6]]
class: DMN 
k: 6 
samples x taxa: 7225 x 61 
Laplace: NaN BIC: NaN AIC: NaN 

I make there are no columns or rows that only have zeros and no matter what I get this result. What could be causing this? I also have tried with a small pseudocount and got the same results.

mtmorgan commented 2 years ago

The package vignette browseVignettes(package="DirichletMultinomial") has for instance

fl <- system.file(package="DirichletMultinomial", "extdata", "Twins.csv")
count <- t(as.matrix(read.csv(fl, row.names=1)))
dmn(count, 2, verbose = TRUE)

with output

dmn, k=2
  Soft kmeans
    iteration 10 change 0.000472
    iteration 20 change 0.000024
    iteration 30 change 0.000001
  Expectation Maximization setup
  Expectation Maximization
    iteration 10 change 0.068541
    iteration 20 change 0.000015
  Hessian
class: DMN
k: 2
samples x taxa: 278 x 130
Laplace: 38872.71 BIC: 39588.93 AIC: 39115.53

Is this what you have? If so it is some unique property of your data. It might be the large number of samples so perhaps subset or collapse, or collinearity of the count matrix, or...? Maybe some basic diagnostics would help, e.g., hist(log10(count)) especially in comparison to the sample data set, or subsampling rows / columns to a smaller matrix to investigate... I'm not really sure what to suggest without your count matrix. If you'd like to share that with me I could take a further look...

Mattie-J commented 7 months ago

Hello Martin,

I am using this package to assign enterotypes based on a gene count table (samples x taxa=1982 x 148) and ran into the same issue.

Following your suggestion, I randomly subsampled 300 rows from the given df, with missing values and constant columns removed (as was done for the total population). The subset of gene counts follows a extremely right-skewed distribution, which is similar to that of the example data you provided.

My output seems to be aligned with your example output, except that there was nothing generated for Laplace, AIC and BIC at the end of k=2, 3, 4, etc. Screenshot 2024-02-13 at 16 06 17

Then by calling lplc <- base::sapply(fit, DirichletMultinomial::laplace) (also for aic and bic), it returned NaN for all of them.

Thank you if you could please advise on it!

p.s., Just got a bit confused because it used to work well for another dataset at my hand (dim: ~800x200, passing all the checks suggested above)..

mtmorgan commented 7 months ago

Can you provide a (sample) of your data that reproduces the problem?