stefpeschel / NetCoMi

Network construction, analysis, and comparison for microbial compositional data
GNU General Public License v3.0
146 stars 26 forks source link

Error when using correlation matrices as input #73

Closed TimFaro closed 1 year ago

TimFaro commented 1 year ago

Hello Stefanie,

I'm currently trying to run NetConstruct() using two correlation matrices as input, like this:

netconstruct <- netConstruct(data = as.matrix(data1),
                             data2 = as.matrix(data2),
                             dataType = "correlation",
                             verbose = 3,
                             seed = 42,
                             cores = 64)

This gives an error:

Error in condition_handling(dataType = dataType, assoType = assoType, : Sample size necessary for Student's t-test.

I found an old forum post in which you addressed this (https://bytemeta.vip/repo/stefpeschel/NetCoMi/issues/12) and it seems to be an issue with the sparsMethod argument. But all the available methods give different errors:

sparsMethod = "none"

Error in assoMat1[edgelist1[i, 1], edgelist1[i, 2]] : subscript out of bounds

sparsMethod = "t-test"

Error in condition_handling(dataType = dataType, assoType = assoType, : Sample size necessary for Student's t-test.

sparsMethod = "bootstrap"

Error in condition_handling(dataType = dataType, assoType = assoType, : Count matrix needed for bootstrapping.

sparsMethod = "threshold"

Sparsify associations via 'threshold' ... Done. Sparsify associations in group 2 ... Done. Error in assoMat1[edgelist1[i, 1], edgelist1[i, 2]] : subscript out of bounds

sparsMethod = "softThreshold"

Error in assoMat1[edgelist1[i, 1], edgelist1[i, 2]] : subscript out of bounds In addition: Warning message: In sparsify(assoMat = assoMat1, countMat = counts1, sampleSize = sampleSize[1], : No power with R^2 above 0.8. Power set to 1.

sparsMethod = "knn"

Sparsify associations via 'knn' ... Done. Error in if (all(x.tmp <= 1)) { : missing value where TRUE/FALSE needed In addition: Warning message: In sqrt(0.5 * (1 - xvec)) : NaNs produced

Do you have any idea how this could be fixed? The input matrices are simple 100 x 100 correlation matrices.

Thanks!

Best

Tim

stefpeschel commented 1 year ago

Hi Tim,

As the error says, the sample size is needed for Student's t-test ;)

Please ensure that you want to use Student's t-test for sparsification. If yes, pass the sample size(s) to the sampleSize argument. This is needed because you're using an association matrix as input and not the data table itself.

You can also set the sparsification argument to "none" if you don't want to sparsify at all.

Best, Stefanie

TimFaro commented 1 year ago

Hi Stefanie,

thanks for the quick reply! I assumed as much but was confused because in the documentation of NetConstruct() on rdrr.io it says that threshold is the default function, not t-test.

When I try to use no sparsification with sparsMethod = "none", I get an error message:

Error in assoMat1[edgelist1[i, 1], edgelist1[i, 2]] : subscript out of bounds

Any idea how to fix it? Thanks!

Best

Tim

stefpeschel commented 1 year ago

Hi Tim,

I just checked the documentation. You're right, in the description is stated that threshold is the default. I will change this. Sorry for the confusion! But you can always have a look at the "Usage" section of the help page (?netConstruct) to see which default values are actually set.

Regarding your "subscript out of bounds" error: I assume your correlation matrices don't have row/column names, which causes this error. This issue will be fixed with the next update. So, please ensure that your matrices have dimnames.

Let me know if you have further questions or issues.

Best, Stefanie

TimFaro commented 1 year ago

Hi Stefanie,

thanks, I indeed did not have the column/row names set, this fixed the issue for none, threshold and softThreshold! Method knn still produces the error *Error in if (all(x.tmp <= 1)) { : missing value where TRUE/FALSE needed In addition: Warning message: In sqrt(0.5 (1 - xvec)) : NaNs produced**

But I don't plan on using it, so this is fine for me, thanks!

Best

Tim

stefpeschel commented 1 year ago

Great that it worked!

The knn sparsification method is only available for dissimilarity networks, not for correlation networks. On develop branch, the function already throws a meaningful error, if one tries to use a correlation matrix for network construction in combination with knn sparsification. So, it will be fixed with the next release.

Best, Stefanie