stefpeschel / NetCoMi

Network construction, analysis, and comparison for microbial compositional data
https://netcomi.de
GNU General Public License v3.0
152 stars 27 forks source link

Warning messages "'nearPD()' did not converge in 100 iterations" in netConstruct #22

Closed jliu2019 closed 3 years ago

jliu2019 commented 3 years ago

Hi,

Thanks for developing this terrific package!

I got some warning messages in the netConstruct step, here is my code:

net_spring <- netConstruct(genus, 
                           group = groups_vec,
                           filtTax = "highestFreq",
                           filtTaxPar = list(highestFreq = 100),
                           filtSamp = "totalReads",
                           filtSampPar = list(totalReads = 1000),
                           zeroMethod = "none", normMethod = "none",
                           measure = "spring",
                           measurePar = list(nlambda = 100, 
                                             rep.num = 100),
                           verbose = 3,
                           seed = 20190101)

These are the warning messages:

Warning messages: 1: In mixedCCA::estimateR(qdat, type = "trunc", method = Rmethod, tol = Rtol, : There are variables in the data that have only zeros or only the same values. 2: In Matrix::nearPD(R, corr = TRUE) : 'nearPD()' did not converge in 100 iterations 3: 101 jobs had warning: "'nearPD()' did not converge in 100 iterations" 101 jobs had warning: "There are variables in the data that have only zeros or only the same values." 4: In mixedCCA::estimateR(qdat, type = "trunc", method = Rmethod, tol = Rtol, : There are variables in the data that have only zeros or only the same values. 5: In Matrix::nearPD(R, corr = TRUE) : 'nearPD()' did not converge in 100 iterations 6: 101 jobs had warning: "'nearPD()' did not converge in 100 iterations" 101 jobs had warning: "There are variables in the data that have only zeros or only the same values."

I was wondering what is the meaning of "nearPD()' did not converge in 100 iterations", could I proceed to the next step then?

Thank a lot for your help!

muellsen commented 3 years ago

Under the hood, the semi-parametric estimator of the correlation in mixedCCA wants to estimate a positive definite correlation matrix. In your case, it seems that some of your data is too sparse, i.e., potentially when subsampling the data for stability selection, some the data only contain zeros. Would it be possible for you to maybe more aggressively filter your taxa or exclude some samples that have very few taxa in there?

stefpeschel commented 3 years ago

Hey,

thanks @muellsen for your comment!

Before closing this issue, I would like to add a few words. It is indeed a sparsity issue as @muellsen already wrote. In a former mixedCCA version, a sparse count matrix led to an error if at least one of the subsamples (which are taken for stability selection) contained taxa with an overall sum of zero (see issue #6 ). This happened if the data contained very rare taxa, which are observed in only a few samples.

The current mixedCCA version produces a warning instead of an error if a taxon with an overall sum of zero is observed in one of the subsamples and the corresponding correlation estimate is set to zero. Nevertheless, instead of ignoring the warning it should be avoided by adapting the filters so that rare taxa are filtered out.

Best, Stefanie