nanxstats / Rcpi

💊 Molecular informatics toolkit with integration of bioinformatics and cheminformatics tools for drug discovery
https://nanx.me/Rcpi/
Artistic License 2.0
36 stars 12 forks source link

Some SMILEs crash the entire R #15

Open lz100 opened 2 years ago

lz100 commented 2 years ago

Some SMILEs break extractDrugLongestAliphaticChain

library(rcdk)
library(Rcpi)
library(magrittr)
"[H]OC1=C2OC(=O)C34C5=C6C7([H])C8=C(C([H])([H])C([H])(C79C([H])([H])C5([H])C(=C([H])C([H])(C%10([H])C([H])([H])C([H])([H])C([H])([H])C%10([H])[H])C([H])([H])C4([H])C%11(OC(=O)C=%12C%11=C([H])C([H])=C([H])C%12C([H])([H])C([H])([H])C([H])([H])N([H])[H])C23C([H])([H])C6([H])[H])C([H])([H])C9([H])[H])C([H])([H])[H])C([H])([H])C([H])([H])C%13([H])N8C([H])([H])C%14([H])C%15([H])N(C%16([H])C%17(C([H])([H])C%18(C([H])([H])C%17([H])[H])C([H])([H])C([H])([H])C([H])([H])C%18([H])[H])C([H])([H])C([H])([H])C%15([H])C([H])([H])C1%16[H])C([H])([H])C%13([H])C%14([H])[H]" %>%
     parse.smiles() %>% .[[1]] %>%
    extractDrugLongestAliphaticChain()

#> Error: segfault from C stack overflow

Then, if you don't run extractDrugLongestAliphaticChain but run with other random Rcpi functions, the entire R session crashes

"[H]OC1=C2OC(=O)C34C5=C6C7([H])C8=C(C([H])([H])C([H])(C79C([H])([H])C5([H])C(=C([H])C([H])(C%10([H])C([H])([H])C([H])([H])C([H])([H])C%10([H])[H])C([H])([H])C4([H])C%11(OC(=O)C=%12C%11=C([H])C([H])=C([H])C%12C([H])([H])C([H])([H])C([H])([H])N([H])[H])C23C([H])([H])C6([H])[H])C([H])([H])C9([H])[H])C([H])([H])[H])C([H])([H])C([H])([H])C%13([H])N8C([H])([H])C%14([H])C%15([H])N(C%16([H])C%17(C([H])([H])C%18(C([H])([H])C%17([H])[H])C([H])([H])C([H])([H])C([H])([H])C%18([H])[H])C([H])([H])C([H])([H])C%15([H])C([H])([H])C1%16[H])C([H])([H])C%13([H])C%14([H])[H]" %>%
     parse.smiles() %>% .[[1]] %>%
     extractDrugXLogP()

 *** caught segfault ***
address 0x311000006, cause 'memory not mapped'

Traceback:
 1: .jcheck()
 2: .jcall(dval, "Lorg/openscience/cdk/qsar/result/IDescriptorResult;",     "getValue")
 3: FUN(X[[i]], ...)
 4: lapply(descvals, .get.desc.values, nexpected = length(dnames))
 5: eval.desc(molecules, "org.openscience.cdk.qsar.descriptors.molecular.XLogPDescriptor",     verbose = !silent)
 6: extractDrugXLogP(.)
 7: "[H]OC1=C2OC(=O)C34C5=C6C7([H])C8=C(C([H])([H])C([H])(C79C([H])([H])C5([H])C(=C([H])C([H])(C%10([H])C([H])([H])C([H])([H])C([H])([H])C%10([H])[H])C([H])([H])C4([H])C%11(OC(=O)C=%12C%11=C([H])C([H])=C([H])C%12C([H])([H])C([H])([H])C([H])([H])N([H])[H])C23C([H])([H])C6([H])[H])C([H])([H])C9([H])[H])C([H])([H])[H])C([H])([H])C([H])([H])C%13([H])N8C([H])([H])C%14([H])C%15([H])N(C%16([H])C%17(C([H])([H])C%18(C([H])([H])C%17([H])[H])C([H])([H])C([H])([H])C([H])([H])C%18([H])[H])C([H])([H])C([H])([H])C%15([H])C([H])([H])C1%16[H])C([H])([H])C%13([H])C%14([H])[H]" %>%     parse.smiles() %>% .[[1]] %>% extractDrugXLogP()

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

The first issue may be CDK java issue, but can we do something in the second case to prevent R crash?

nanxstats commented 2 years ago

I would recommend checking JDK and rJava configurations. Normally, this would return a proper NA:

library(magrittr)

"CCCC" %>%
  rcdk::parse.smiles() %>%
  .[[1]] %>%
  Rcpi::extractDrugLongestAliphaticChain()

#> nAtomLAC
#> 1        4

x <- "[H]OC1=C2OC(=O)C34C5=C6C7([H])C8=C(C([H])([H])C([H])(C79C([H])([H])C5([H])C(=C([H])C([H])(C%10([H])C([H])([H])C([H])([H])C([H])([H])C%10([H])[H])C([H])([H])C4([H])C%11(OC(=O)C=%12C%11=C([H])C([H])=C([H])C%12C([H])([H])C([H])([H])C([H])([H])N([H])[H])C23C([H])([H])C6([H])[H])C([H])([H])C9([H])[H])C([H])([H])[H])C([H])([H])C([H])([H])C%13([H])N8C([H])([H])C%14([H])C%15([H])N(C%16([H])C%17(C([H])([H])C%18(C([H])([H])C%17([H])[H])C([H])([H])C([H])([H])C([H])([H])C%18([H])[H])C([H])([H])C([H])([H])C%15([H])C([H])([H])C1%16[H])C([H])([H])C%13([H])C%14([H])[H]"

x %>%
  rcdk::parse.smiles() %>%
  .[[1]] %>%
  Rcpi::extractDrugLongestAliphaticChain()

#> nAtomLAC
#> 1       NA

If that's really not possible, perhaps wrapping calls with callr could be a workaround to avoid crashing the main process and to allow exception handling. See an example at https://nanx.me/blog/post/disposable-computing-with-callr/.

lz100 commented 2 years ago

Thanks for the recommendation. I suspect this is OS related. Both my attempts on Centos 7 and Ubuntu 20 crashed. I wonder if you are using a non-Linux system?

nanxstats commented 2 years ago

Just tested on macOS with Oracle JDK installed via homebrew cask and Windows 10 with Amazon Corretto JDK installed via chocolatey. They all work ok out of the box.