oloBion / Retip

Retip - Retention Time prediction for metabolomics
31 stars 15 forks source link

R 4.0.2 crashes while computing chemical descriptors #3

Open ricoderks opened 3 years ago

ricoderks commented 3 years ago

Hi,

I wanted to try the Retip package, but running your example it made R crash when it wants to compute the chemical descriptors.

In Rstudio I get this output:

[1] "Computing Chemical Descriptors 1 of 970 ... Please wait"
[80786:80786:20200826,091229.019964:ERROR process_memory_range.cc:86] read out of range
[80786:80786:20200826,091229.020011:ERROR elf_image_reader.cc:558] missing nul-terminator
[80786:80786:20200826,091229.020086:ERROR elf_dynamic_array_reader.h:61] tag not found
.
.
.
.
[80786:80786:20200826,091229.024746:ERROR elf_dynamic_array_reader.h:61] tag not found
[80786:80787:20200826,091229.053386:ERROR directory_reader_posix.cc:42] opendir: No such file or directory (2)

It outputs the line [80786:80786:20200826,091229.024746:ERROR elf_dynamic_array_reader.h:61] tag not found many many times. I also tried without parallel computing, but same issue. Can you help me with this?

My sessioninfo:

 setting  value                       
 version  R version 4.0.2 (2020-06-22)
 os       Ubuntu 20.04.1 LTS          
 system   x86_64, linux-gnu           
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       Europe/Amsterdam            
 date     2020-08-26   

I have openjdk-11 installed. I also tried with openjdk-8, but same issue.

Cheers, Rico

ricoderks commented 3 years ago

Hi,

Small update. It fails on this line in getCD(): descs_x_loop <- rcdk::eval.desc(mols_x1, descNames)

Cheers, Rico

PaoloBnn commented 3 years ago

Hi, It seems to a compatibility problem with last version of Ubuntu and Rstudio

See this:

https://community.rstudio.com/t/rstudio-is-crashing-after-update-on-ubuntu-20-04-focal/66682

They are working to fix it in a update of Rstudio. Hope it will be released soon!

Thanks! Paolo

PaoloBnn commented 3 years ago

A possible solution found in the link before:

“Just thought I'd add a finding on this: I'm running Lubuntu 20.04 and have freshly installed R (4.0.2) and R Studio (1.3.59), display=1920x1080.

I ran into the same problems as others have described. The problem may be associated with the nouveau driver for Nvidia cards.

Switching to the NVidia proprietary driver (under additional drivers as suggested by this post on askubuntu ) fixed the issue and RStudio runs just fine now.”

Maybe this help

ricoderks commented 3 years ago

Hi Paolo,

I'am using Rstudio 1.3.1056.

Just to be sure I ran the same code in just R and it stops with this error:

[1] "Computing Chemical Descriptors 1 of 970 ... Please wait"
Error: segfault from C stack overflow

If I want to run some other command I get:

 *** caught segfault ***
address (nil), cause 'unknown'

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 

Cheers, Rico

ricoderks commented 3 years ago

A possible solution found in the link before:

“Just thought I'd add a finding on this: I'm running Lubuntu 20.04 and have freshly installed R (4.0.2) and R Studio (1.3.59), display=1920x1080.

I ran into the same problems as others have described. The problem may be associated with the nouveau driver for Nvidia cards.

Switching to the NVidia proprietary driver (under additional drivers as suggested by this post on askubuntu ) fixed the issue and RStudio runs just fine now.”

Maybe this help

I'll look into this. Thanks.

Cheers, Rico

ricoderks commented 3 years ago

A possible solution found in the link before: “Just thought I'd add a finding on this: I'm running Lubuntu 20.04 and have freshly installed R (4.0.2) and R Studio (1.3.59), display=1920x1080. I ran into the same problems as others have described. The problem may be associated with the nouveau driver for Nvidia cards. Switching to the NVidia proprietary driver (under additional drivers as suggested by this post on askubuntu ) fixed the issue and RStudio runs just fine now.” Maybe this help

I'll look into this. Thanks.

Cheers, Rico

I have ATI Radeon Pro WX 2100 card.

Cheers, Rico

ricoderks commented 3 years ago

I've been playing around in getCD() and if I leave the descriptor "org.openscience.cdk.qsar.descriptors.molecular.LongestAliphaticChainDescriptor" (index 20) out, it goes fine.

The modified function looks like this:

function (x) {
  print(paste0("Converting SMILES..."))
  for (i in 1:nrow(x)) {
    smi <- rcdk::parse.smiles(as.character(unlist(x[i, "SMILES"])))[[1]]
    smi1 <- rcdk::generate.2d.coordinates(smi)
    smi1 <- rcdk::get.smiles(smi, smiles.flavors(c("CxSmiles")))
    x$SMILES[i] <- smi1
    print(paste0(i, " of ", nrow(x)))
  }
  descNames <- rcdk::get.desc.names(type = "all")
  descNames1 <- c("org.openscience.cdk.qsar.descriptors.molecular.BCUTDescriptor")
  print(paste0("Checking for compound errors..."))
  mols_x <- rcdk::parse.smiles(as.character(unlist(x[1, "SMILES"])))
  descs1_x <- rcdk::eval.desc(mols_x, descNames1)
  for (i in 2:nrow(x)) {
    mols1 <- rcdk::parse.smiles(as.character(unlist(x[i, "SMILES"])))
    descs1_x[i, ] <- rcdk::eval.desc(mols1, descNames1)
    print(paste0(i, " of ", nrow(x)))
  }
  x_na <- data.frame(descs1_x, x)
  x_na_rem <- x_na[stats::complete.cases(x_na), ]
  x_na_rem <- x_na_rem[, -c(1:6)]
  print(paste0("Computing Chemical Descriptors 1 of ", nrow(x_na_rem), 
               " ... Please wait"))
  mols_x1 <- rcdk::parse.smiles(as.character(unlist(x_na_rem[1, "SMILES"])))[[1]]
  rcdk::convert.implicit.to.explicit(mols_x1)
  # leave out "org.openscience.cdk.qsar.descriptors.molecular.LongestAliphaticChainDescriptor"
  descs_x_loop <- rcdk::eval.desc(mols_x1, descNames[-20])
  for (i in 2:nrow(x_na_rem)) {
    mols <- rcdk::parse.smiles(as.character(unlist(x_na_rem[i, "SMILES"])))[[1]]
    rcdk::convert.implicit.to.explicit(mols)
    # leave out "org.openscience.cdk.qsar.descriptors.molecular.LongestAliphaticChainDescriptor"
    descs_x_loop[i, ] <- rcdk::eval.desc(mols, descNames[-20])
    print(paste0(i, " of ", nrow(x_na_rem)))
  }
  datadesc <- data.frame(x_na_rem, descs_x_loop)
  return(datadesc)
}

Cheers, Rico

ricoderks commented 3 years ago

I looked deeper in rcdk::eval.desc() and it goes wrong here:

descvals <- lapply(molecules, function(a, check) {
        val <- tryCatch({
          .jcall(desc, "Lorg/openscience/cdk/qsar/DescriptorValue;", 
                 "calculate", a, check = check)
        })
      }, check = FALSE)

and if you have only 1 descriptor:

descvals <- lapply(molecules, function(a, b) {
      val <- tryCatch({
        .jcall(b, "Lorg/openscience/cdk/qsar/DescriptorValue;", 
               "calculate", a)
      }, warning = function(e) return(NA), error = function(e) return(NA))
    }, b = desc)

Hope this is helpfull. Don't know how to go further. Never worked with rJava stuff.

cheers, Rico

mapio commented 1 year ago

Hi, It seems to a compatibility problem with last version of Ubuntu and Rstudio

See this:

https://community.rstudio.com/t/rstudio-is-crashing-after-update-on-ubuntu-20-04-focal/66682

They are working to fix it in a update of Rstudio. Hope it will be released soon!

Mmm, I get the exact same error in R (command line), so I bet Rstudio is not part of the problem.

If I run getCD(HILIC) immediately after loading the library, I get

Error: segfault from C stack overflow
Warning messages:
1: In FUN(X[[i]], ...) : Molecule must have 3D coordinates
2: In FUN(X[[i]], ...) : The AtomType null could not be found
3: In FUN(X[[i]], ...) : Molecule must have 3D coordinates

on R version 4.2.3 (2023-03-15) running in Docker with an image generated with

ARG R_VERSION=latest
FROM rocker/verse:${R_VERSION}

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        build-essential \
        libomp-dev \
        default-jdk && \
    git clone \
        --recursive \
        --branch stable \
        --depth 1 https://github.com/Microsoft/LightGBM && \
    cd ./LightGBM && \
    sh build-cran-package.sh --no-build-vignettes && \
    R CMD INSTALL ./lightgbm_*.tar.gz && \
    cd .. && \
    rm -rf ./LightGBM

RUN R -e 'install.packages(c("devtools", "reticulate"))'
RUN R -e 'library(reticulate); install_miniconda()'
RUN R -e 'devtools::install_github("Paolobnn/Retiplib")'
RUN R -e 'devtools::install_github("Paolobnn/Retip")'
RUN R -e 'library(keras); install_keras()'

the first RUN is taken from https://github.com/microsoft/LightGBM/blob/master/docker/dockerfile-r that is the official Microsoft Dockerfile for LightGBM.

The problem persists also with older version, I've attempted to build with docker build --build-arg R_VERSION=3.6.3-ubuntu18.04 and it crashes in the exact same way.