Open joshhochuli opened 1 year ago
do we know the side effects of identical embedding? Is it that the KD tree cannot handle them or is it just an issue we noticed about the data
A little but a of elbow grease and Longleaf would allow me to make a steriochemically flat Enamine database set that would in theory remove any chance of identical embedding. Might be easier than trying to fix the code
Current approach is to inject a tiny amount of noise, not sure this is ideal. If we ever do exact checking of embeddings this will cause errors.