nanxstats / Rcpi

💊 Molecular informatics toolkit with integration of bioinformatics and cheminformatics tools for drug discovery
https://nanx.me/Rcpi/
Artistic License 2.0
37 stars 12 forks source link

Issue with the convMolFormat function #9

Closed Boris-Droz closed 4 years ago

Boris-Droz commented 4 years ago

Hello, I used many time the convMolFormat function with great success. Thank you again for this useful package. Right now, I am using it with a bench of input (10000 mol file) coming from a commercial predictive in-silico tool from Bruker. I wanted to generate a smile table to match the smile for further comparison with other data. However, in some point (after 520 loops), I get the message "Too many open files". So I tried the common advice given in some forum which is closeAllConnections(). It seams that is not where came from the problem. I check with showConnections(all=TRUE) and only 0,1,2 which are standard connections are open.

I will really appreciate any idea to debug this.

Below the dummy code to see the problem if necessary

Thank you very much

Boris

## get file path
  fns <- list.files(fdir[i],pattern=".mol$",full.names = TRUE)  

for (j in 1:length(fns)) # mol loop
      {
      # read mol file or other drawing file
      convMolFormat(infile= fns[j], outfile= 'temp.smi'
                    , from='mol', to='smiles')
      # read smile text
      t.smile <- readMolFromSmi(smifile='temp.smi', type = "text")
## then I put t.smile in a data frame to latter save it
}
nanxstats commented 4 years ago

@Boris-Droz Instead of looking into connections life cycle... I have a generic, creative solution for such problems: use callr to do the conversion in chunks in separate R processes:

library("Rcpi")
library("callr")

dir.create("test")
for (i in 1:2000) file.copy(system.file("compseq/DB00530.sdf", package = "Rcpi"), paste0("test/", i, ".sdf"))

fns <- list.files("test/", pattern = ".sdf$", full.names = TRUE)

convert <- function (fns, idx) {
  callr::r(function (fns, idx) {
    smiles <- c()
    for (i in idx) {
      Rcpi::convMolFormat(infile = fns[i], outfile = "temp.smi", from = "sdf", to = "smiles")
      smiles <- c(smiles, Rcpi::readMolFromSmi(smifile = "temp.smi", type = "text")[1])
    }
    smiles
  }, args = list(fns, idx))
}

k <- length(fns)
chunks <- split(1:k, ceiling(seq_along(1:k)/400))
smi <- rep(NA, k)
for (i in 1:length(chunks)) smi[chunks[[i]]] <- convert(fns, chunks[[i]])
smi
Boris-Droz commented 4 years ago

Nice, thank you very much for this prompt answer. Problem resolved.