Closed pcko1 closed 5 years ago
This step places the particle at the center of a molecule (SMILES) emedding. This means a particle can only jump from one molecule to another during optimization. If a particle's position update does not result into a point in space corresponding to a new molecule, it basically gets reset to the previous point. I included this cyclic step as I found that in some cases the optimizer was able to "optimize" a scoring function (e.g. QSAR model) by shifting the position without actually changing the molecule...
Hmm, by "corresponding to a new molecule" you mean a molecule different to the previous one or simply any valid molecule?
What I mean with this, is that the position in the CDDD space will change (slightly) but it will decode back to the same SMILES. Thus, its still a valid molceule (the penalty will only cover invalid SMILES). I just want to avoid optimizing the scoring function from say 0.2 to 0.8 while not changing the actual molecule (only its latent representation). Unfortunately, this can actually happen if a molecule correspond to a larger region in the CDDD space and the scoring function (e.g. a QSAR model that takes points in this space as input) is not well defined in this region.... This is particularly problematic if you include a harsh substructure constrains and most of the molecules in the neighbourhood get penalized....
Hmm, by "corresponding to a new molecule" you mean a molecule different to the previous one or simply any valid molecule?
a molecule different to the previous one
Unfortunately, this can actually happen if a molecule correspond to a larger region in the CDDD space and the scoring function (e.g. a QSAR model that takes points in this space as input) is not well defined in this region....
ah now I understand, so you have trained your QSAR model on CDDD points and that model is very sensitive to the location of the particles, meaning that even if two locations correspond to the same underlying molecule, the QSAR molecule will give different scores! So to my understanding, if a QSAR model (trained on CDDD points) is not used in the cost function, this step can still be omitted right?
yeah... that should be true. Which is nice, because this step is computational expensive... a flag for turning this on/off would be nice then...
Is there a reason why this cyclic conversion takes place:
swarm.x
->swarm.smiles
->swarm.x
and not updateswarm.x
directly? In other words, is line 70 really necessary?https://github.com/jrwnter/mso/blob/992b46dcb4f7f4ae9027489a8ee46cd0a928c4fc/mso/optimizer.py#L68-L71