No suitable seed found - Githubissues

miquelduranfrigola commented 5 months ago

Hi,

Thanks for a great repository.

While trying to run the function get_starting_seeds on molecules CC12CC=C3C=C4C(C(C(CC45CCC3(C1CC=C2C6=CC7=C(C=C6)C=CN=C7)O5)N(C)C)O)O, I am unfortunately unable to get any seed.

Is there any workaround? I would really like to apply SQUID on my query molecule but unfortunately I can't since no seeds are found.

Thanks a lot in advance!

keiradams commented 4 months ago

Hi Miquel,

The fastest way to work around the seed issue is to slightly modify your molecule to include a suitable seed. The "extra" atoms could then be manually deleted after generating analogues.

To illustrate this, the following molecules are examples are slight modifications of your molecule that would enable a seed to be found: CC1(C(C2=CC3=C(C=CN=C3)C=C2)=C)CC=C4C=C5C(O)C(O)C(N(CCC)C)CC56CCC4(O6)C1 CC12CC=C3C=C4C(OCC)C(O)C(N(C)C)CC45CCC3(O5)C1CC=C2C6=CC7=C(C=CN=C7)C=C6 CC12CC=C3C=C4C(O)C(O)C(N(C)C)CC45CCC3(O5)C1CC=C2C6=CC7=C(C(CCC)=CN=C7)C=C6

However, there is a bigger issue with your molecule: the large 19-membered ring structure is not in SQUID's fragment library. Because the encoder concatenates embeddings of the fragments to the atom embeddings, SQUID won't be able to directly encode this molecule. You could possibly try to hack your way around this by only encoding the shape point cloud, and sampling the atom embeddings from the variational priors (e.g., lambda = 1.0) so that the model loses all information about the excluded fragment. Note that I never attempted this idea, so it would be a bit experimental.

miquelduranfrigola commented 4 months ago

Hello @keiradams this is extremely useful. Thank you so much. I'll give it a try. Congrats again on a great tool

keiradams / SQUID

No suitable seed found #11