jjerphan commented 6 years ago

As we have to submit a list of tens binding ligands for each protein, we need to find a way to match them. Several strategies can be used, this issue is to tracked the design of such strategies.

The first approach would be to return, for each protein, the 10 ligands with the highest probability. However, we know that there is an extra constraint, more precisely that there is a one to one correspondence. Hence, we should or must take decisions for ligands generally and not per protein as we could choose a ligand for a lot of different protein several protein with an high confidence.

If we are given n_p proteins and n_l ligands to test :

The first simpler approach would consist to evaluate the n_l ligands and take the 10 best ones.
The second approach would consist to evaluate the n_p* n_l systems and then take, for each ligands that are chosen several times, the associated protein of highest confidence.

jjerphan commented 6 years ago

21 included a matching between protein and ligands for one model.

We need to evaluate the final metric of the matching, that is the average of good predictions for proteins.

A prediction for a protein is a set of 10 ligands that are predict to have high chances to bind with it binding. A good prediction is a prediction that contains the actual correct binding ligand.

jjerphan commented 6 years ago

Some advanced matching are possible but we may not have time to develop them. Closing it for now.

jjerphan / CS5242Project

Matching proteins and ligands together #20

21 included a matching between protein and ligands for one model.