molecularmodelinglab / simsearchserver

1 stars 1 forks source link

Handle identical embeddings #8

Open joshhochuli opened 1 year ago

joshhochuli commented 1 year ago

Current approach is to inject a tiny amount of noise, not sure this is ideal. If we ever do exact checking of embeddings this will cause errors.

jimmyjbling commented 1 year ago

do we know the side effects of identical embedding? Is it that the KD tree cannot handle them or is it just an issue we noticed about the data

A little but a of elbow grease and Longleaf would allow me to make a steriochemically flat Enamine database set that would in theory remove any chance of identical embedding. Might be easier than trying to fix the code