rdkit / mmpdb

A package to identify matched molecular pairs and use them to predict property changes.
Other
212 stars 57 forks source link

Automatically get MMPs for a given data set #51

Closed pykao closed 1 year ago

pykao commented 1 year ago

Hi authors,

I thought this tool can automatically find the MMPs from a group of molecules.

For example, if mmpdb is given a sdf, csv or smi file, it can generate a resulting file which has all the MMPs from the given file.

However, when I read the paper, it seems that the user needs to provide user-defined cutting patterns. (the constants part in the paper)

Is mmpdb a interactive MMPs generation tool?

Best,

PK

adalke commented 1 year ago

When I hear "interactive" I think of a GUI. mmpdb is a command-line tool.

mmpdb only accepts SMILES as input, not SDF or CSV. See mmpdb help-analysis for documentation on the overall process. (Also in the README.)

There's a default cutting pattern so you don't need to specify one yourself. Use mmpdb fragment --help to see the details:

 The --cut-smarts argument supports the following short-hand aliases:
   'default': Cut all C-[!H] non-ring single bonds except for Amides/Esters/Amidines/Sulfonamides and CH2-CH2 and CH2-CH3 bonds
      smarts: [#6+0;!$(*=,#[!#6])]!@!=!#[!#0;!#1;!$([CH2]);!$([CH3][CH2])]
   'cut_AlkylChains': As default, but also cuts CH2-CH2 and CH2-CH3 bonds
      smarts: [#6+0;!$(*=,#[!#6])]!@!=!#[!#0;!#1]
   'cut_Amides': As default, but also cuts [O,N]=C-[O,N] single bonds
      smarts: [#6+0]!@!=!#[!#0;!#1;!$([CH2]);!$([CH3][CH2])]
   'cut_all': Cuts all Carbon-[!H] single non-ring bonds. Use carefully, this will create a lot of cuts
      smarts: [#6+0]!@!=!#[!#0;!#1]
   'exocyclic': Cuts all exocyclic single bonds
      smarts: [R]!@!=!#[!#0;!#1]
   'exocyclic_NoMethyl': Cuts all exocyclic single bonds apart from those connecting to CH3 groups
      smarts: [R]!@!=!#[!#0;!#1;!$([CH3])]
pykao commented 1 year ago

Hi @adalke,

Thanks for your reply. Do you know once I get the resulting file, i.e., test_data.fragments or test_data.mmpdb, how can I access the MMPs of the input data set?

["VERSION", "mmpdb-fragment/2"]                                                                     
 ["SOFTWARE", "mmpdb-2.1"]                                                                           
 ["OPTION", "cut_smarts", "[#6+0;!$(*=,#[!#6])]!@!=!#[!#0;!#1;!$([CH2]);!$([CH3][CH2])]"]            
 ["OPTION", "max_heavies", "100"]                                                                    
 ["OPTION", "max_rotatable_bonds", "10"]                                                             
 ["OPTION", "method", "chiral"]                                                                      
 ["OPTION", "num_cuts", "3"]                                                                         
 ["OPTION", "rotatable_smarts", "[!$([NH]!@C(=O))&!D1&!$(*#*)]-&!@[!$([NH]!@C(=O))&!D1&!$(*#*)]"]    
 ["OPTION", "salt_remover", "<default>"]                                                             
 ["RECORD", "phenol", "Oc1ccccc1", 7, "Oc1ccccc1", [[1, "N", 1, "1", "*O", "0", 6, "1", "*c1ccccc1",  "c1ccccc1"], [1, "N", 6, "1", "*c1ccccc1", "0", 1, "1", "*O", "O"]]]                                
 ["RECORD", "catechol", "Oc1ccccc1O", 8, "Oc1ccccc1O", [[1, "N", 1, "1", "*O", "0", 7, "1", "*        c1ccccc1O", "Oc1ccccc1"], [1, "N", 7, "1", "*c1ccccc1O", "0", 1, "1", "*O", "O"], [2, "N", 6, "11",  "*c1ccccc1*", "01", 2, "11", "*O.*O", null]]]                                                       
 ["RECORD", "2-aminophenol", "Oc1ccccc1N", 8, "Nc1ccccc1O", [[1, "N", 1, "1", "*N", "0", 7, "1", "*   c1ccccc1O", "Oc1ccccc1"], [1, "N", 1, "1", "*O", "0", 7, "1", "*c1ccccc1N", "Nc1ccccc1"], [1, "N",   7, "1", "*c1ccccc1N", "0", 1, "1", "*O", "O"], [1, "N", 7, "1", "*c1ccccc1O", "0", 1, "1", "*N",     "N"], [2, "N", 6, "11", "*c1ccccc1*", "01", 2, "12", "*N.*O", null]]]                               
 ["RECORD", "2-chlorophenol", "Oc1ccccc1Cl", 8, "Oc1ccccc1Cl", [[1, "N", 1, "1", "*Cl", "0", 7, "1",  "*c1ccccc1O", "Oc1ccccc1"], [1, "N", 1, "1", "*O", "0", 7, "1", "*c1ccccc1Cl", "Clc1ccccc1"], [1,    "N", 7, "1", "*c1ccccc1Cl", "0", 1, "1", "*O", "O"], [1, "N", 7, "1", "*c1ccccc1O", "0", 1, "1", "*  Cl", "Cl"], [2, "N", 6, "11", "*c1ccccc1*", "01", 2, "12", "*Cl.*O", null]]]                        
 ["RECORD", "o-phenylenediamine", "Nc1ccccc1N", 8, "Nc1ccccc1N", [[1, "N", 1, "1", "*N", "0", 7,      "1", "*c1ccccc1N", "Nc1ccccc1"], [1, "N", 7, "1", "*c1ccccc1N", "0", 1, "1", "*N", "N"], [2, "N",    6, "11", "*c1ccccc1*", "01", 2, "11", "*N.*N", null]]]                                              
 ["RECORD", "amidol", "Nc1cc(O)ccc1N", 9, "Nc1ccc(O)cc1N", [[1, "N", 1, "1", "*N", "0", 8, "1", "*    c1cc(O)ccc1N", "Nc1ccc(O)cc1"], [1, "N", 1, "1", "*N", "0", 8, "1", "*c1ccc(O)cc1N",                 "Nc1cccc(O)c1"], [1, "N", 1, "1", "*O", "0", 8, "1", "*c1ccc(N)c(N)c1", "Nc1ccccc1N"], [1, "N", 8,   "1", "*c1cc(O)ccc1N", "0", 1, "1", "*N", "N"], [1, "N", 8, "1", "*c1ccc(N)c(N)c1", "0", 1, "1", "*   O", "O"], [1, "N", 8, "1", "*c1ccc(O)cc1N", "0", 1, "1", "*N", "N"], [2, "N", 7, "12", "*c1ccc(*     )c(N)c1", "10", 2, "12", "*N.*O", null], [2, "N", 7, "12", "*c1ccc(N)c(*)c1", "10", 2, "12", "*N.*   O", null], [2, "N", 7, "12", "*c1ccc(O)cc1*", "01", 2, "11", "*N.*N", null], [3, "N", 6, "123", "*   c1ccc(*)c(*)c1", "201", 3, "112", "*N.*N.*O", null]]]                                               
 ["RECORD", "hydroxyquinol", "Oc1cc(O)ccc1O", 9, "Oc1ccc(O)c(O)c1", [[1, "N", 1, "1", "*O", "0", 8,   "1", "*c1cc(O)ccc1O", "Oc1ccc(O)cc1"], [1, "N", 1, "1", "*O", "0", 8, "1", "*c1ccc(O)c(O)c1",        "Oc1ccccc1O"], [1, "N", 1, "1", "*O", "0", 8, "1", "*c1ccc(O)cc1O", "Oc1cccc(O)c1"], [1, "N", 8,     "1", "*c1cc(O)ccc1O", "0", 1, "1", "*O", "O"], [1, "N", 8, "1", "*c1ccc(O)c(O)c1", "0", 1, "1", "*   O", "O"], [1, "N", 8, "1", "*c1ccc(O)cc1O", "0", 1, "1", "*O", "O"], [2, "N", 7, "12", "*c1ccc(*     )c(O)c1", "01", 2, "11", "*O.*O", null], [2, "N", 7, "12", "*c1ccc(O)c(*)c1", "01", 2, "11", "*O.*   O", null], [2, "N", 7, "12", "*c1ccc(O)cc1*", "01", 2, "11", "*O.*O", null], [3, "N", 6, "123", "*   c1ccc(*)c(*)c1", "012", 3, "111", "*O.*O.*O", null]]]                                               
 ["RECORD", "phenylamine", "Nc1ccccc1", 7, "Nc1ccccc1", [[1, "N", 1, "1", "*N", "0", 6, "1", "*       c1ccccc1", "c1ccccc1"], [1, "N", 6, "1", "*c1ccccc1", "0", 1, "1", "*N", "N"]]]                     
 ["RECORD", "cyclopentanol", "C1CCCC1N", 6, "NC1CCCC1", [[1, "N", 1, "1", "*N", "0", 5, "1", "*       C1CCCC1", "C1CCCC1"], [1, "N", 5, "1", "*C1CCCC1", "0", 1, "1", "*N", "N"]]]

Best, PK

adalke commented 1 year ago

1) you should use the most recent development version of mmpdb, at https://github.com/adalke/mmpdb/tree/v3-dev . It's close to being merged back to the main branch. There are a couple of last little bits to take care of, which mostly don't affect you.

In v3, the JSON lines format you quoted here for the fragmentations has been replaced by a SQLite database.

2) mmpdb rulecat gives you the rules. More complex custom analysis requires understanding the schema (see mmpdblib/schema.sql) and making your own SQL queries.

3) There is very little unpaid support for mmpdb. You'll need to try out the programs, read the command-line help (including mmpdb help and the sub-help commands), and do your own experimentation. I am also available for paid support.

pykao commented 1 year ago

@adalke Thank you for your help :)