meddwl / psearch

3D ligand-based pharmacophore modeling
BSD 3-Clause "New" or "Revised" License
46 stars 16 forks source link

Structures alignment #13

Closed OlgaGKononova closed 1 year ago

OlgaGKononova commented 1 year ago

Hello,

I was wondering, is there any particular reason why code doesn't perform any structures alignment before looking for pharmacophores. I checked both master and lib branches, there is no structures alignment: not conformers within a molecule but rather alignment of different molecules w.r.t. the largest, for example.

I tried with master code: implemented (quick and dirty) alignment of molecules (using O3A from RDKit) between generation of conformations and building DB of pharmacophores, and it seems to affect final result. Haven't checked with lib version yet.

Interested to hear your opinion about it.

DrrDom commented 1 year ago

I do not fully understand why one should align conformers within a molecule. What is the reason?

If you create DB using the same set of conformers which were rearranged in 3D space and got different results, this may point out on reproducibility issue - how molecules are split on train/test. I do not know whether this pipeline is strictly reproducible in repeated runs or not. There is no random module import in the code, so some differences may occur in repeated runs.

This is an alignment-free approach. We split a whole pharmacophore on quadruplets and calculate a feature vector taking into consideration composition, distances and chirality. If these vectors are identical for two pharmacophores this is pharmacophore match. Of course, we miss some true matches using this approach, but there is no false matches. Alignment of screened ligands to a model is optional and is used only for visual analysis if needed (see screen_db.py),

OlgaGKononova commented 1 year ago

Hm, maybe I am not completely understanding the model then. I thought that the algorithms building pharmacophoric model accounts for position of pharmacophoric points (i.e. donor, acceptor, ring, ...) that form quadruple. And position of the pharmacophoric points depends on conformation.

DrrDom commented 1 year ago

You are right that output pharmacophore models depend on input molecule conformations. However, there is no need to align conformers to achieve that.

To find a match between two objects you can create some sort of unique and deterministic signatures which will represent them. Identical objects will results in identical signatures. To retrieve also similar objects we use binning step which adds fuzziness to object representation and, thus, our computed signatures incorporate this fuzziness to some extent. This is a general idea behind. The rest is implementation of calculations of those signatures.

Pharmacophore signature is a set of quadruplets and their numbers which we convert to a hash for simplicity of processing. This set of quadruplets (a feature vector converted to a hash) uniquely represents a pharmacophore: composition of features, distances and chirality. Let's left implementation of these signatures behind the scene for now. To find a common pharmacophore which matches actives and does not match inactive molecules (in training set) we started to enumerate all 3-point pharmacophores of actives and inactives, calculate their hashes, calculate occurrence of each hash in active and inactive molecules and select pharmacophores which occur preferably in actives and not in inactives. Afterwards, we enumerated all 4-point pharmacophores having those 3 features selected on the previous step. Again calculate hashes for these 4-point pharmacophores and choose those ones which occur preferably in actives. Thus we grow our pharmacophores step by step and check how well they match active/inactive. Growing is stopped if a pharmacophore started to do not some actives.

How signatures are calculated is described in the paper (https://doi.org/10.3390/molecules23123094). If you have further questions we may arrange a short zoom/skype meeting.

OlgaGKononova commented 1 year ago

Thank you for the detailed reply. I read all the papers mentioned in the repos, but it looks like I thought a bit differently of the process and should probably re-read the papers. Anyway, I will be happy to have a discussion about pmapper/psearch over zoom: it seems to be a useful tool for us. Please shoot me with email to olya.kononova@syngenta.com so we can schedule a chat.