stefpeschel / NetCoMi

Network construction, analysis, and comparison for microbial compositional data
GNU General Public License v3.0
146 stars 26 forks source link

Trouble with netCompare() Permutation test #29

Closed David-Madrigal closed 2 years ago

David-Madrigal commented 2 years ago

Hi!

I've been having trouble while performing netCompare() with permutation tests. With low number of permutation (30-40), everything flows smoothly, but I've been having trouble with 100L or higher permutations.

I've been running the following:

> comp_season <- netCompare(props_domos,permTest = T, nPerm = 100L, cores = 4L,verbose = T, logFile = "log.txt")
Calculate network properties ... Done.

I leave this process running overnight, and after that message, there was no further progress. No log file generated, neither.

In contrast, When performing the same command but with lower number of permutations, a message and a progress bar of permutations is displayed:

> comp_season <- netCompare(props_domos,permTest = T, nPerm = 10L, cores = 4L,verbose = T, logFile = "log.txt")
Calculate network properties ... Done.
Execute permutation tests ... 
starting worker for localhost:11328 
starting worker for localhost:11328 
starting worker for localhost:11328 
starting worker for localhost:11328 
  |                                                                              |   0%
Type: EXEC 

I've been running this on both Windows 10 and macOS Big Sur (11.6) systems, and I'm having the same issue. NetCoMi version 1.0.2, R version 4.0.2, Rstudio version 1.4.1717

Any idea what could be happening?

Thank you so much! David

stefpeschel commented 2 years ago

Hey David,

Hmm this behavior is strange because the message "Execute permutation tests" is printed even before the SNOW cluster is started. Between the message "Calculate network properties ... Done" and starting the parallel workers is only one major step, where the matrix with permuted group labels is generated.

What are the dimensions of your count matrices? And what happens if you increase the number of permutations stepwise from 50 to 100? Does the time until you see the message "Execute permutation tests" also increase linearly?

It's hard to reproduce your issue without having your data. But what you could do to locate the cause of your problem is to debug the netCompare function by running:

debugonce(netCompare)
comp_season <- netCompare(....)

With debugonce() you jump into debugging mode the next time the function is called. In doing so, you go through netCompare() step-by-step and, hopefully, you'll see where the execution gets stuck.

Please let me know if you have any questions. You can also write me an email to have a closer look at this issue.

Best, Stefanie

David-Madrigal commented 2 years ago

Thank you! I've just emailed you giving more details about my issue.

Thanks again, David

stefpeschel commented 2 years ago

Hey David,

Thank you for sending me further information via email. Now I know where your issue comes from: your data set contains only three samples in each group.

First of all I have to say that this sample size is far too small to get reliable association estimates. If you want to dive deeper into this topic, I would highly recommend reading this paper: https://academic.oup.com/nargab/article/2/4/lqaa100/6040969?login=true#219595119 The focus is on normalization methods, but the sample size problem is also discussed in great detail. In figure 2, for instance, you'll see that correlation estimates are highly dependent on sample size, if no shrinkage is performed. Unfortunately, I have not managed to implement shrinkage into NetCoMi yet, but it's planned for a future version.

Now back to your issue. Regardless of whether the estimated associations are meaningful or not, you won't be able to perform a permutation test with such a small sample size. The number of possible permutations with a sample size of three in each group is 20, however, you need at least 1000 permutations to get reliable test results.

The reason why netCompare() gets stuck in your case is that the function tries to find a matrix with permuted group labels without any duplicates. So, at the moment the functions takes samples until all group assignments are unique. This is usually no problem because even with 10 samples in each group (which is still not much), there are already more than 180,000 possible permutations. Nevertheless, I'll have to implement a backup so that the function does not run forever in such cases. You could still do the network comparison without permutation tests. Then, you'll get at least the group differences of the local and global network measures, and also Jaccards index and the Rand index are calculated, but you won't get any p-values.

I'm sorry that we had trouble finding the source of your issue just because netCompare() doesn't handle small sample sizes correctly. I'll fix that as soon as possible.

Best, Stefanie

David-Madrigal commented 2 years ago

Now I get what was going on.

Thank you so much for your time to help me with this issue!

Best, David