How to build mmpdb with large data set

rdkit / mmpdb

A package to identify matched molecular pairs and use them to predict property changes.

Other

197 stars 55 forks source link

How to build mmpdb with large data set #35

Closed iwatobipen closed 2 years ago

iwatobipen commented 2 years ago

Dear developer,

I would like to ask how to build mmpdb with large data set. I tried to build mmpdb with chembl28 data. At first, I made chunk files from over 1 million smiles which came from chemblDB. Then made fragment files from the chunk data and merge them to one file. Finally I run mmpdb index command against merged fragment data. But the process was killed due to lack of memory. Are there any way to build mmpdb from large size of fragments? My environment 32GB RAM. Any advice or suggestion are greatly appreciated. Thanks,

Taka

KramerChristian commented 2 years ago

Dear Taka,

indexing chembl28 into a MMP database with the standard mmpdb tool requires a lot of memory and is not likely to be feasible with the standard settings. You can modify some of the settings to reduce the number of transformations and the resolution of the transformations, as discussed here:

https://github.com/rdkit/mmpdb/issues/27

I am not sure though whether this will allow you to index chembl28, unless you reduce the DB to transformations of very small fragments. If you only want to index the transformations and not associate any properties with it, I recommend you check out this code which is built for exactly that purpose and derived from mmpdb:

https://github.com/mahendra-awale/medchem_moves

Hope this helps... Bests, Christian

iwatobipen commented 2 years ago

Dear Chirstian,

Thanks for your reply. What I want to do is https://github.com/mahendra-awale/medchem_moves! I read the code but the repo doesn't provide index_algorithm.py so I couldn't run index_algorithm.findRulesAndEnvs. If you have any information about it could you please provide suggestions about it?

Kind regards, Taka

KramerChristian commented 2 years ago

Dear Taka,

I will talk to Mahendra, the repo owner, to see what he can do.

Bests, Christian

iwatobipen commented 2 years ago

Dear Chirstian, Thanks! That sounds great.

Kind regards, Taka