pschmidtke / blog

5 stars 1 forks source link

Binding site comparison - current benchmark issues | Spinning coral #6

Closed utterances-bot closed 2 years ago

utterances-bot commented 2 years ago

Binding site comparison - current benchmark issues | Spinning coral

A (long) comment on current pitfalls on binding site comparison papers & their benchmarking

https://pschmidtke.github.io/blog/binding%20site/pocket/cavity/pocket%20comparison/structure-based%20drug%20design/2022/11/07/binding-site-comparison-benchmark.html

derekdebepersonal commented 2 years ago

Hey Peter, Very nice thoughtful write-up. What we at Eidogen found from our algorithm work in the early naughts which we never published (or made publicly available, sadly) was that all of the sites where there was any structural similarity were very easy to assign and ranked really high above any of those where some sort of convergent evolution in progress was driving two independent folds to bind similar molecules. We found many such examples, but the p-values for those matches were typically far less convincing. Looking through that lens, I would add to your points that methods should produce some sort of statistically meaningful confidence score (p-value). Even if it’s just a matter of training against background distribution matches of similar size. Some of the lingering challenges in this area are that the best approaches are likely to take into account both sequence and family sequence composition as well as shape characteristics, and not a lot of groups have all that algorithm machinery up and working (your shop is one that does). I do think the objectives we were seeking at Eidogen - good structures, accurate binding site determinations (which is easier), along with accurate alignments and confidence scores for all those binding sites - is a valuable goal. I think the broader field seems slow to adopt that as more valuable grand challenge than just the structures themselves. I do think the deep learning guys will end up winning in this space as well just as they have for the structures. Hopefully they are working on these next challenges. Then teams like DiscEngine with great web based platforms to store and visualize all this information will be in great position to be the engine to deliver a lot of great structural informatics value to the world. Keep up the great efforts!

pschmidtke commented 2 years ago

Hey Derek,

a lot of valid points and i understand the need for p-values if you compare to a statistical bulk. I hope I'll be able to publish everything, but for now we implemented advanced filters which identifies sort of strict matches. Either they are there, or not, but we also have a lot of parameters we can play with to go down maybe to a certain noise level. But overall, on low scoring (thus low confidence matches) a confidence score should be a must have nowadays. Let's see where the ML field goes, but the thing I'm getting upset with is just the state of the litterature and how things get evaluated & validated. So the actual data to validate things and probably also learn from is probably not as clean as it should be. But on the long run I guess you are right. Let's hope that before we'll have the chance to understand a few things.

Same as you had at Eidogen...when sieving through fuzzy low similarity hits you see a lot of very interesting things that haven't been described nor carefully analyzed anywhere - might need a parallel academic career though to go through that ;)