renozao / NMF

NMF: A Flexible R package for Nonnegative Matrix Factorization
137 stars 41 forks source link

R segmentation error in NMF #98

Closed kiranvpaul closed 7 years ago

kiranvpaul commented 7 years ago

Hii, I am using NMF package for ~500 samples(columns) and 10000+ events(rows). I am having 7 group of matrix files for which i have run NMF. For 6 files i am getting the result but for one file which has the maximum events (64000 events) is errorring out as "R segmentation error". I checked with our cluster guys who said its not a memory issue because i am using 10 cores for 10 runs. I have removed all the NA and negative values.

Please find below my code library(NMF) setwd("/ysm-gpfs/home/kvp6/scratch60/Alex") prime = as.matrix(read.table("Combined_AF_PSI_pos_filt_matrix.txt", header=T,sep="\t",row.names = 1)) ri.r <- nmf(prime,2:6,nrun=10,seed=123456)

The R version i am using is R 3.3.2, i have also tried this in R 3.4.1

my input file looks like this(this is a sample file with 2 events and 31 samples)

Events | SRR1567002 | SRR1567003 | SRR1567004 | SRR1567009 | SRR1567010 | SRR1567011 | SRR1567012 | SRR1567013 | SRR1567014 | SRR1567015 | SRR1567016 | SRR1567017 | SRR1567018 | SRR1567019 | SRR1567020 | SRR1567021 | SRR1567022 | SRR1567023 | SRR1567024 | SRR1567025 | SRR1567026 | SRR1567027 | SRR1567028 | SRR1567029 | SRR1567030 | SRR1567031 | SRR1567032 | SRR1567033 | SRR1567034 | SRR1567035 | SRR1567036 ENSG00000000003.14;AF:chrX:100635746-100636191:100636689:100635746-100636793:100637104:- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.999977412 | 0 | 0 | 0 | 0 | 0.99997337 | 0 | 0 | 0 | 0.999994643 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 ENSG00000000457.13;AF:chr1:169888890-169893788:169893952:169888890-169894007:169894267:- | 0 | 0.99982518 | 0.999999915 | 1 | 1 | 1 | 1 | 1 | 0.999997432 | 1 | 0 | 0 | 0.999999791 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0.994912957 | 1 | 1 | 1 | 0.999997163 | 1 | 0 | 0 | 1 | 1 | 1

its a tab seperated file, just for a differentiation i am putting a pipe symbol here.

Please find below the error i am getting

Loading required package: methods Loading required package: pkgmaker Loading required package: registry

Attaching package: ‘pkgmaker’

The following object is masked from ‘package:base’:

isNamespaceLoaded

Loading required package: rngtools Loading required package: cluster NMF - BioConductor layer [NO: missing Biobase] | Shared memory capabilities [OK] | Cores 19/20 To enable the Bioconductor layer, try: install.extras(' NMF ') [with Bioconductor repository enabled]

caught segfault address 0x2b56564a34f0, cause 'memory not mapped'

Traceback: 1: cor(t(x)) 2: .local(object, ...) 3: predict(x, what = what, dmatrix = TRUE) 4: predict(x, what = what, dmatrix = TRUE) 5: silhouette.NMF(object, what = "features") 6: silhouette(object, what = "features") 7: .local(object, ...) 8: summary(fit(object), ...) 9: summary(fit(object), ...) 10: summary(best.fit, ...) 11: summary(best.fit, ...) 12: summary(res, target = V) 13: summary(res, target = V) 14: doTryCatch(return(expr), name, parentenv, handler) 15: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 16: tryCatchList(expr, classes, parentenv, handlers) 17: tryCatch({ res <- nmf(x, r, method, nrun = nrun, model = model, ...) if (!isNMFfit(res, recursive = FALSE)) return(res) c.matrices[[as.character(r)]] <<- consensus(res) fit[[as.character(r)]] <<- res if (verbose) cat("+ measures ... ") measures <- summary(res, target = V) if (verbose) cat("OK\n") measures}, error = function(e) { mess <- if (is.null(e$call)) e$message else paste(e$message, " [in call to '", e$call[1], "']", sep = "") mess <- paste("[r=", r, "] -> ", mess, sep = "") if (stop) { if (verbose) cat("\n") stop(mess, call. = FALSE) } if (verbose) message("ERROR") return(mess)}) 18: FUN(X[[i]], ...) 19: lapply(X = X, FUN = FUN, ...) 20: sapply(range, function(r, ...) { k.rank <<- k.rank + 1L if (verbose) cat("Compute NMF rank=", r, " ... ") orng <- RNGseed() if (k.rank < length(range)) on.exit(RNGseed(orng), add = TRUE) res <- tryCatch({ res <- nmf(x, r, method, nrun = nrun, model = model, ...) if (!isNMFfit(res, recursive = FALSE)) return(res) c.matrices[[as.character(r)]] <<- consensus(res) fit[[as.character(r)]] <<- res if (verbose) cat("+ measures ... ") measures <- summary(res, target = V) if (verbose) cat("OK\n") measures }, error = function(e) { mess <- if (is.null(e$call)) e$message else paste(e$message, " [in call to '", e$call[1], "']", sep = "") mess <- paste("[r=", r, "] -> ", mess, sep = "") if (stop) { if (verbose) cat("\n") stop(mess, call. = FALSE) } if (verbose) message("ERROR") return(mess) }) res}, ..., simplify = FALSE) 21: nmfEstimateRank(x, range = rank, method = method, nrun = nrun, seed = seed, rng = rng, model = model, .pbackend = .pbackend, .callback = .callback, verbose = verbose, .options = .options, ...) 22: .local(x, rank, method, ...) 23: nmf(x, rank, method = strategy, ...) 24: nmf(x, rank, method = strategy, ...) 25: nmf(x, rank, method, seed = seed, model = model, ...) 26: nmf(x, rank, method, seed = seed, model = model, ...) 27: .local(x, rank, method, ...) 28: nmf(x, rank, NULL, ...) 29: nmf(x, rank, NULL, ...) 30: nmf(prime, 2:6, nrun = 10, seed = 123456) 31: nmf(prime, 2:6, nrun = 10, seed = 123456) An irrecoverable exception occurred. R is aborting now ... /var/spool/slurmd/job3946163/slurm_script: line 17: 26843 Segmentation fault Rscript $run2

Does NMF has a maximum number of rows capability because of which this error is happening?? Kindly help.

renozao commented 7 years ago

This error occurs when computing the feature silhouette on large matrices. The development version on github contains a fix for this:

devtools::install_github('renozao/NMF@devel')