rdkit / mmpdb

A package to identify matched molecular pairs and use them to predict property changes.
Other
197 stars 55 forks source link

Obtain list of matched pairs with common core from an ID. #50

Open isohelio opened 1 year ago

isohelio commented 1 year ago

Hi, I've built my MMP database for a set of compounds but am struggling to generate the output I would like.

My use case is a pretty typical one, finding changes that lead to large property change in compounds obtained from patents.

c1ccccc1O X1 c1ccccc1OC X2 c1ccccc1N X3 c1ccccc1OC1CC1 X4 c1cc(Cl)ccc1O X5

What I'm hoping to do is generate a list of matched pairs for a given processed compound. e.g. X1

c1ccccc1 X1 O X2 OC c1ccccc1 X1 O X3 N c1ccccc1 X1 O X4 *OC1CC1 etc.

Is this possible?

Thanks Mike

KramerChristian commented 1 year ago

Hi Mike,

this is not a standard use case that we implemented, so you need a workaround. You can generate a list of all matched pairs if you index to a .csv file rather than a sqlite database. As a next step, you'd then need to match the activities with the pairs in the .csv file and filter to teh pairs that you are interested in - this should be rather straightforward with for example pandas.

Bests, Christian

isohelio commented 1 year ago

Hi Christian,

Thanks for the pointer, I've managed to pull together the values I need from the csv output.

This would seem to be a useful feature for the main application? Its pretty much the first thing I do with matched pairs, just list the relationships within a set of compounds.

Thanks again, Mike

adalke commented 1 year ago

Is your organization interesting in funding that development, with the results contributed back to mmpdb?

isohelio commented 1 year ago

Hi Andrew,

Not for functionality like this I'm afraid.

The CSV file output contains the exact information needed, which will work for small datasets. Seems a pretty useful addition to be able to recreate that for a list of provided ids direct from the sqlite database.

Thanks, Mike