Closed bhomass closed 5 months ago
Hi @bhomass,
What you are saying would simply reinforce the entropy effect. What we wanted to express through the distance is that the perturbation has a sufficiently "unique" embedding in space. This combined with the entropy can be a signal for good embedding quality. On the other hand, if the embedding is unique and the entropy high, the model would still be unsure. We formulated it this way as the most likely case is that the model predicts "no perturbation" resulting in compounds from different pathways being embedded closely together.
There are definitely options to formulate this differently :) this is just how we did it
agree with the entropy component, which says if the nearest neighbors are of a different pathway.
disagree with the distance component. low distance increases uncertainty if neighbors are of a different pathway, but would be lower uncertainty if the same pathway. But the inverse log distance term increases the overall uncertainty score regardless of the nature of the nearest neighbor.