zdk123 / SpiecEasi

Sparse InversE Covariance estimation for Ecological Association and Statistical Inference
GNU General Public License v3.0
188 stars 65 forks source link

SPIEC-EASI: unable to complete 'Selecting model with pulsar using stars...' step #129

Open sabdeolive opened 4 years ago

sabdeolive commented 4 years ago

Hi, I am trying to run SPIEC-EASI with a phyloseq object containing the metagenomic info of prediabetic individuals. I have followed this tutorial: http://psbweb05.psb.ugent.be/conet/microbialnetworks/spieceasi.php

I have managed to perform all steps prior to application of the spiec.easi function. Whenever I try to run the corresponding line of code, my R cannot get past the 'Selecting model with pulsar using stars...' step. (I ran it for 2 hours and still no progress was made beyond this point) Is there something I have done that is preventing the analysis from running? Is there something I can do to make the analysis faster? (I have already tried to specify the number of cores using ncores but get the following error: 'Error in mclapply(X, FUN, mc.cores = mc.cores, ...) : 'mc.cores' > 1 is not supported on Windows'

Please find my code below.

Thank you in advance for your help.

Kind regards, Sabrina.

T2D.fil phyloseq-class experiment-level object otu_table() OTU Table: [ 981 taxa and 402 samples ] sample_data() Sample Data: [ 402 samples by 22 sample variables ] tax_table() Taxonomy Table: [ 981 taxa by 7 taxonomic ranks ]

otus <- T2D.fil@otu_table@.Data taxa <- T2D.fil@tax_table@.Data

filterobj <- filterTaxonMatrix(otus,minocc=20,keepSum = TRUE, return.filtered.indices = TRUE) otus.f <- filterobj$mat taxa.f <- taxa[setdiff(1:nrow(taxa),filterobj$filtered.indices),]

dummyTaxonomy=c("kdummy","p","c","o","f","g","s__") taxa.f=rbind(taxa.f,dummyTaxonomy)

rownames(taxa.f)[nrow(taxa.f)]="0" rownames(otus.f)[nrow(otus.f)]="0"

updatedotus=otu_table(otus.f, taxa_are_rows = TRUE) updatedtaxa=tax_table(taxa.f) phyloseqobj.f=phyloseq(updatedotus, updatedtaxa)

ps_otu1 <- phyloseqobj.f@otu_table@.Data + 1 phyloseqobj.f@otu_table@.Data <- ps_otu1

spiec.out=spiec.easi(phyloseqobj.f, method="mb", pulsar.params=list(rep.num=20)) Applying data transformations... Selecting model with pulsar using stars...

zdk123 commented 4 years ago

@sabdeolive why did you close this issue? With that data size you are definitely going to want parallelization. I do have an option that should work for windows via the batchtools package. Let me know if you want a walk through.

sabdeolive commented 4 years ago

@zdk123 I closed the issue because I finally recieved an output from spiec.easi. It took 4 hours but I thought that maybe this length of time was normal as I could find no mention of a time period online but did read that the analysis takes a while. Ah okay that makes sense. Yes, that would be great! Thank you so much.

RitaBogorad commented 4 years ago

Hello,

I have a similar issue with the function. I am following the tutorial from the book "Microbiome Analysis" (https://www.springer.com/gp/book/9781493987269). I get no errors during the function execution but the process freezes at the "Selecting model with pulsar using stars" step. I am running the code from RStudio on a cluster and shortly after the execution, the process is not visible anymore through the htop command. I have tried to run this function directly from R and not RStudio, the result is the same. I am using as an input an otu table of 763 OTUs and 13 samples.

SpiecEasi.matrix <- spiec.easi(otu_short, method = "glasso", lambda.min.ratio = 1e-2, nlambda = 20, icov.select.params= list(rep.num =50))

Applying data transformations... Selecting model with pulsar using stars...

Thank you very much for your help!

Rita

zdk123 commented 4 years ago

@RitaBogorad thanks for the report. You can run in multicore mode (single machine) by supplying ncores to icov.select.params. This is documented on the README.

If you're on a cluster you could also try batch mode.

RitaBogorad commented 4 years ago

Thank you very much for a fast reply!

I have tried executing the command both with my otu_table and the default amgut1.filt, and the result is the same - the function freezes at the submission step. Am I doing something wrong?

bargs <- list(rep.num=50, seed=10010, conffile="parallel") SpiecEasi.matrix <- spiec.easi(amgut1.filt,

  • method = "glasso", lambda.min.ratio=1e-3, nlambda=30,
  • sel.criterion='stars', pulsar.select='batch', pulsar.params=bargs) Applying data transformations... Selecting model with batch.pulsar using stars... Sourcing configuration file '/home/mbogorad/R/x86_64-pc-linux-gnu-library/3.6/pulsar/config/batchtools.conf.parallel.R' ... Created registry in '/tmp/RtmpcDKq54/registry3a27268d5689' using cluster functions 'Multicore' Adding 51 jobs ... Submitting 51 jobs in 51 chunks using cluster functions 'Multicore' ...
zdk123 commented 4 years ago

Can you filter OTUs and try again? I'm not sure where the internals may be locked up and there doesn't seem to be any error message.

RitaBogorad commented 4 years ago

So using as an input a phyloseq object works fine but not a table from an otu_table(phyloseq) function. For me, this solution is the most optimal. Thank you for your help!

zdk123 commented 4 years ago

That's very odd.. Can you try running the internal function SpiecEasi:::.phy2mat on the phyloseq and otu_table? I would expect both to spit out the same underlying data matrix. Thanks.

fabiodepa commented 4 years ago

Hi, I have a similar problem in running spiec.easi function on my own data. I manually did a phyloseq object from qiime data passing otu table, taxonomy and sample metadata. When I run the spiec.easi command on this phyloseq object it last forever. I then tried to run the command just with a martix data with the otu table and it goes through in few minutes. Just to check I run the function you indicated in the last comment SpiecEasi:::.phy2mat and the output looks like it transposes the otu table because it outputs the samples in rows and the features in columns. I guess this could be the reason why it get stuck with the phyloseq object.

This is the code I run in first place, when it get stuck.

# read and format the feature table data
d16s <- read.table("input_feature_table.tsv", sep="\t", header=TRUE)
rownames(d16s) <- d16s$feature
d16s[,1] <- NULL
d16s.m <- data.matrix(d16s)
# must be a phyloseq object
d16s.m.fp = otu_table(d16s.m, taxa_are_rows = TRUE)

# read and format the feature taxonomy data
# Read taxonomy info in
taxonomy <- read.table("sub_taxonomy.tsv", row.names = 1, sep = "\t")
taxonomy <- as.matrix(taxonomy)
# Make compatible for phyloseq
taxonomy_final = tax_table(taxonomy)

# read metadata
meta_data <- read.csv("samples_metadata.txt", header = T, row.names = 1, sep="\t")
meta_final <- sample_data(meta_data)

# phyloseq object
phyl.obj.16s <- merge_phyloseq(d16s.m.fp, taxonomy_final, meta_final)

# co-occurences

se.gl.16s <- spiec.easi(phyl.obj.16s, method='glasso', lambda.min.ratio=1e-2, nlambda=20, pulsar.params=list(rep.num=50, ncores=3)
se.mb.16s <- spiec.easi(phyl.obj.16s, method='mb', lambda.min.ratio=1e-2, nlambda=20, pulsar.params=list(rep.num=50, ncores=3))

And it get stuck at the first spiec.easi instance.

Just to add information that may be helpful I have a little less than 1500 features in this dataset across 90 samples.

Thank you in advance for any help. Fabio

zdk123 commented 4 years ago

thanks for pointing that out @fabiodepa

Treating the OTUs/features as the columns is standard in statistics / R packages, but I should add that to the troubleshooting guide on the README.

For phyloseq objects, that row/col margin is stored in the object itself, so we can check and transpose as needed. I'm not surprised that p=1500 takes a while to complete, you might need more cores for parallelization and ~20gigs of memory.

andrewd789 commented 3 years ago

Hi, I am having similar difficulties. I am using SpiecEasi v 1.1.0. I have a dataset with 36 samples and about 3000 taxa (across four different taxonomic kingdoms). I have been running spiec-easi as follows (where otutab_list is a list of four tables of taxon abundances) on a cluster (I'm not using phyloseq):

pargs <- list(rep.num = 10, seed = 12345, ncores = 4)
spiec.easi(otutab_list, method = "mb", nlambda = 5, lambda.min.ratio = 0.0001, 
sel.criterion ='bstars', pulsar.select = TRUE, pulsar.params = pargs)

This ran to completion, taking only about 3 minutes. I then made nlamba = 7, which also completed in about 3 minutes. But I then made nlambda = 10 (with no other changes), which didn't complete in one hour (it hit a time limit). The completed results had an empty network, indicating the lambda.min.ratio needed to be lowered. So I tried nlambda = 5, lambda.min.ratio = 0.00005, but this didn't complete in 6 hours. I don't know if the process hung or crashed (there was no indication of that), or would have completed if allowed to run for longer.

I'm surprised that changing the nlambda and lambda.min.ratio values slightly causes such drastic changes in the computation time. Is that expected? Is there any way to predict how long it might take to run?

Thanks!

zdk123 commented 3 years ago

@andrewd789 That's somewhat surprising, but I also wouldn't expect runtime to be linear w.r.t lambda.min.ratio.

How long does the run take when pulsar.select=FALSE? This should give you an estimate for how long stars should take for each [subsampled] replicate.

Then I would also try it with rep.num=2. In "bstars" mode, 2 jobs are run on the full lambda path, then the lambda path is adjusted for the rest of the rep.num-2 jobs. Since the smallest values of lambda are the most computationally expensive, the hope is that that overall runtime will still be less even with the added serialization. However, for some datasets we've seen, the improvement isn't there.

Finally you can expect a high degree of statistical noise when p>>n, so it's possible the network becomes very dense at some lambda value between E-4 and E-5: you can simply try values between these two numbers to see if there's a faster-running value that still results in an optimal solution. To guide you, you can take a look at the output se$select$stars$summary, since the stars selected lambda index is going to be at .05 (by default) you can see how close you're getting to a stable network.

Failing that, I would recommend filtering the OTU tables a bit more.

andrewd789 commented 3 years ago

@zdk123 thanks for the suggestions. I was unable to run it with pulsar.select=FALSE and/or rep.num=2 (perhaps I misunderstood; do other parameters also have to be altered?). However, I found that filtering the data to a greater extent helped, with the process completing in a reasonable time frame (~15 min).

I also observed that splitting the data into two separate portions for analysis (18 samples each) became problematic when lambda.min.ratio below 0.00007, with very similar errors to https://github.com/zdk123/SpiecEasi/issues/73 and https://github.com/zdk123/SpiecEasi/issues/111. I'm unsure, but could this be related to https://github.com/HMJiangGatech/huge/issues/8 ? For example, I found that repeating that error-reproducing code with lambda = 0.001 works, but with lambda = 0.0001 (i.e. huge::huge(X, method='mb', lambda=c(0.0001))) produces a memory error.

sessionInfo():
R version 4.0.1 (2020-06-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /scale_wlg_persistent/filesets/opt_nesi/CS400_centos7_bdw/imkl/2020.0.166-gimpi-2020a/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] SpiecEasi_1.1.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5       lattice_0.20-41  codetools_0.2-16 glmnet_4.0-2
 [5] foreach_1.5.1    MASS_7.3-51.6    grid_4.0.1       magrittr_2.0.1
 [9] stats4_4.0.1     huge_1.3.4.1     pulsar_0.3.7     Matrix_1.2-18
[13] splines_4.0.1    iterators_1.0.13 igraph_1.2.6     parallel_4.0.1
[17] survival_3.1-12  compiler_4.0.1   pkgconfig_2.0.3  shape_1.4.5
[21] VGAM_1.1-4
alfrilling commented 1 year ago

Hi. I am having the same issue with a sample set. My phyloseq object is just 3 samples and 1035 taxa. However it gets stucked at first step. the phy2mat option and otu_table reveal the same structure so at this moment I don't know how to proceed.