nolanlab / Rclusterpp

Memory efficient clustering in R for large datasets
Other
20 stars 11 forks source link

occasional segfault with Rclusterpp.hclust #6

Closed pkimes closed 6 years ago

pkimes commented 6 years ago

I'm running into segfault errors occasionally (maybe 1/10 times) when running Rclusterpp.hclust. Any idea why this might be occurring?

I've uploaded a simple (simulated) data set where I run into a segfault as a gist for reference. This data set only appears to cause a segfault with method = "ward", but I've also observed segfaults with other linkages (e.g. method = "complete") as well (for other data).


Printing out the sessionInfo.

 library(Rclusterpp)
 # Loading required package: Rcpp
 # Loading required package: RcppEigen

 sessionInfo()
 # R version 3.4.2 (2017-09-28)
 # Platform: x86_64-apple-darwin15.6.0 (64-bit)
 # Running under: OS X El Capitan 10.11.6
 # 
 # Matrix products: default
 # BLAS/LAPACK: /usr/local/Cellar/openblas/0.2.20/lib/libopenblasp-r0.2.20.dylib
 # 
 # locale:
 # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
 # 
 # attached base packages:
 # [1] stats     graphics  grDevices utils     datasets  methods   base
 # 
 # other attached packages:
 # [1] Rclusterpp_0.2.3    RcppEigen_0.3.3.3.1 Rcpp_0.12.16
 # 
 # loaded via a namespace (and not attached):
 # [1] compiler_3.4.2  Matrix_1.2-12   tools_3.4.2     grid_3.4.2
 # [5] lattice_0.20-35

The data can be loaded from the gist.

 dat <- read.table(paste0("https://gist.githubusercontent.com/pkimes/",
                          "9d84b603a3f856b100c33e67c7c477fd/raw/",
                          "079ab207f980edafbb893562c6f4ad9e9929338c/",
                          "testdata.txt"))

The standard stats::hclust appears fine.

 hclust(dist(dat), method = "ward.D2")
 # 
 # Call:
 # hclust(d = dist(dat), method = "ward.D2")
 # 
 # Cluster method   : ward.D2
 # Distance         : euclidean
 # Number of objects: 48

Unfortunately, I run into a segfault with Rclusterpp.hclust.

 Rclusterpp.hclust(dat)
 # 
 #  *** caught segfault ***
 # address 0x5, cause 'memory not mapped'
 # 
 # Traceback:
 #  1: .Call("hclust_from_data", data = x, link = as.integer(method),     dist = as.integer(distance), p = as.numeric(p), DUP = FALSE,     NAOK = FALSE, PACKAGE = "Rclusterpp")
 #  2: Rclusterpp.hclust(dat)
 # 
 # Possible actions:
 # 1: abort (with core dump, if enabled)
 # 2: normal R exit
 # 3: exit R without saving workspace
 # 4: exit R saving workspace
SamGG commented 6 years ago

Hi, Could you give the sessionInfo() output? On Windows 7, R3.4.3, no error.

> dat <- read.table(paste0("https://gist.githubusercontent.com/pkimes/",
+                          "9d84b603a3f856b100c33e67c7c477fd/raw/",
+                          "079ab207f980edafbb893562c6f4ad9e9929338c/",
+                          "testdata.txt"))
> install.packages("Rclusterpp")
Installing package into ‘C:/Users/sampgg/Documents/R/win-library/3.4’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/Rclusterpp_0.2.3.zip'
Content type 'application/zip' length 948053 bytes (925 KB)
downloaded 925 KB

package ‘Rclusterpp’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\sampgg\AppData\Local\Temp\RtmpaC1GoO\downloaded_packages
> View(dat)
> library(Rclusterpp)
Loading required package: Rcpp
Loading required package: RcppEigen
Warning message:
package ‘Rclusterpp’ was built under R version 3.4.4 
> Rclusterpp.hclust(dat)

Call:
Rclusterpp.hclust(x = dat)

Cluster method   : ward 
Distance         : euclidean 
Number of objects: 48 

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rclusterpp_0.2.3    RcppEigen_0.3.3.4.0 Rcpp_0.12.15        nortest_1.0-4       kSamples_1.2-7     
[6] SuppDists_1.1-9.4  

loaded via a namespace (and not attached):
[1] compiler_3.4.3  Matrix_1.2-12   tools_3.4.3     yaml_2.1.17     grid_3.4.3      lattice_0.20-35
> 
pkimes commented 6 years ago

@SamGG thanks for the quick reply! I included the sessionInfo at the top of my example after loading Rclusterpp. (Unfortunately, because I get a segfault, I can't print the sessionInfo after calling Rclusterpp.hclust.) I'm running R-3.4.2.

Here it is again:

sessionInfo()
# R version 3.4.2 (2017-09-28)
# Platform: x86_64-apple-darwin15.6.0 (64-bit)
# Running under: OS X El Capitan 10.11.6
# 
# Matrix products: default
# BLAS/LAPACK: /usr/local/Cellar/openblas/0.2.20/lib/libopenblasp-r0.2.20.dylib
# 
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base
# 
# other attached packages:
# [1] Rclusterpp_0.2.3    RcppEigen_0.3.3.4.0 Rcpp_0.12.16
# 
# loaded via a namespace (and not attached):
# [1] compiler_3.4.2  Matrix_1.2-12   grid_3.4.2      lattice_0.20-35

I tried updating RcppEigen to RcppEigen_0.3.3.4.0 and am still getting the segfault. If you're not seeing the error, I guess this might be something with how my Rcpp/RcppEigen is setup?

SamGG commented 6 years ago

Hi Patrick, This might be related to some Mac specificities. Unfortunately, I am not experienced with that environment. But you will get some help from the developers that are used to. Meanwhile you could maybe try the latest R 3.4.4. Best.

rbruggner commented 6 years ago

Hi Patrick,

In the past I've run into segfaults if Rcpp / RcppEigen / Rclusterpp are compiled using different versions of gcc / different compilers. Easiest way to remedy was usually just to reinstall those packages from source rather than use the CRAN binaries.

R> library("devtools")
R> install.packages(c("Rcpp","RcppEigen"),type="source")
R> install_github("nolanlab/Rclusterpp")

LMK if that helps?

pkimes commented 6 years ago

Hi @rbruggner, thanks! I'll give this a try and re-open the issue if I still run into problems.