stefpeschel / NetCoMi

Network construction, analysis, and comparison for microbial compositional data
GNU General Public License v3.0
143 stars 24 forks source link

freeze on permutation test (differential network analysis) #81

Closed katerinka-arh closed 1 year ago

katerinka-arh commented 1 year ago

Hi Stefanie,

I have a problem with running the diffnet() command with permutation test option. The analysis starts, all MPI workers load and start, and then it seems to freeze forever without exit/error (I have tried running up to 10 days using slurm, slurm output file is last updated on day 3 of running).

The last messages in the slurm output file:

100%Type: DONE Type: DONE Type: DONE

Same freeze happens if I just run it in R session, without using slurm, it's like if it's stuck in some eternal loop

Can you please help me solve this problem?

Best regards, Ekaterina

stefpeschel commented 1 year ago

Hey Ekaterina,

The execution time highly depends on the methods used for network construction, and on the size of your network.

How many nodes do your network(s) have?

Please test your code with a small number of permutations, say 5 or 10, first and see if that works before you start the actual analysis. Does it run with only a few permutations?

Best, Stefanie

katerinka-arh commented 1 year ago

Dear Stephanie,

thank you for such a prompt response!

I have 175 nodes (taxa) in both networks, and I use 20 cores with 20Gb per core. I have tried with 100, 10 and 3 permutations - no difference; my last try with 100 permutations, started 30.01, slurm.out last changed 31.01 and I stopped the job today, 10.02

The permutation test seems to be finished, based on the slurm.out, the progress bar is at 100%, and yet, the script (or the command when used without slurm) is still running

Best, Ekaterina

stefpeschel commented 1 year ago

Hmm if it doesn't even terminate with only 3 permutations, something must be going wrong.

Could you please provide me with your code from network construction to diffnet()? So I can see if the issue occurs with other data as well. Would be interesting, which association measure you're using.

Thanks and best regards

katerinka-arh commented 1 year ago

Hi Stefanie,

Here is the code for network construction. I have a dataset with taxa abundance info on samples that are either from diseased (pos) or healthy (neg) patients

Net <- netConstruct(data=pos, data2=neg, filtTax=c(‘numbSamp’), filtTaxPar=list(numbSamp=150), datatype=’counts’, measure=’spring’, measurePar=list(Rmethod=’original’,subsample.ratio=0.75,rep.num=20), dissFunc=’unsigned’, verbose=3, seed=1234, cores=20L)

DiffNet<-diffnet(Net, diffMethod= “permute”, adjust= “ldfr”, cores=20L, nPerm=10)

Best regards, Ekaterina

stefpeschel commented 1 year ago

Hey Ekaterina, Thanks for the code.

I assume that constructing a single network with your code just takes too much time. I would highly suggest setting the SPRING argument Rmethod to "approx" because the approx method is much faster than the original one. Here's the corresponding paper describing the fast approach: https://www.tandfonline.com/doi/abs/10.1080/10618600.2021.1882468

You could also try to further reduce the number of taxa because the execution time depends on the network size.

Here's a script that works for me:

# Load data sets from American Gut Project (from SpiecEasi package)
data("amgut2.filt.phy")

# Split data into two groups: with and without seasonal allergies
amgut_season_yes <- phyloseq::subset_samples(amgut2.filt.phy, 
                                             SEASONAL_ALLERGIES == "yes")
amgut_season_no <- phyloseq::subset_samples(amgut2.filt.phy, 
                                            SEASONAL_ALLERGIES == "no")

amgut_season_yes
amgut_season_no

Net <- netConstruct(data=amgut_season_yes, 
                    data2=amgut_season_no, 
                    filtTax=c("numbSamp"), 
                    filtTaxPar=list(numbSamp=15),
                    dataType = "counts",
                    measure="spring", 
                    measurePar=list(Rmethod="approx",
                                    subsample.ratio=0.75,
                                    rep.num=20), 
                    dissFunc="unsigned", 
                    verbose=3, 
                    seed=1234, 
                    cores=5L)

system.time(
  DiffNet<-diffnet(Net, 
                   diffMethod= "permute", 
                   adjust= "lfdr", 
                   cores=5L, 
                   nPerm=10)
)

Note that I reduced the number of samples to 15. Otherwise, there were no taxa remaining since that data set is quite small. And I used only 5 CPU cores because I ran it on my PC.

I got this output:

Checking input arguments ... 
Done.
Execute permutation tests ... 
  |===============================================================================================================| 100%
Adjust for multiple testing using 'lfdr' ... 
Execute fdrtool() ...
Step 1... determine cutoff point
Step 2... estimate parameters of null distribution and eta0
Step 3... compute p-values and estimate empirical PDF/CDF
Step 4... compute q-values and local fdr

Done.
No significant differential associations detected after multiple testing adjustment.
   user  system elapsed 
   0.55    0.65  999.22 

Does the code run with your data?