Open stanstrup opened 7 years ago
If compound_tbl_sdf
was internal to createCompDb
(so you'd always call createCompDb
directly) you could append the sqlite file instead to avoid the memory requirements. This was what I did in my approach for pubchem.
Note: createCompDb
does already support to generate a CompDb
from multiple input files. The man page does also tell you that you can provide the name(s) of the file(s). I will make it more clearly in the help page.
So far I used lapply
to process multiple files - I'll switch to bplapply
.
OK, I have extended the documentation a little. I've also tried to enable parallel processing, but that's not possible because SQLite/RSQLite does not support concurrent write operations. I've also tried: https://stackoverflow.com/questions/36831302/parallel-query-of-sqlite-database-in-r and https://www.r-bloggers.com/synchronization-for-r-with-the-flock-package/ but that didn't help either. So, presently it's not possible.
Ah yes I tried the exact same things. That's why I ended up doing an sqlite for each SDF and then constructing the final sqlite after the parallel runs.
For example for pubchem.
Multithreading with
pbapply
would be nice.See also https://github.com/EuracBiomedicalResearch/CompoundDb/issues/1#issuecomment-340341955