Why optimal transport matrix is not used?

octavian-ganea / equidock_public

EquiDock: geometric deep learning for fast rigid 3D protein-protein docking

MIT License

231 stars 58 forks source link

Why optimal transport matrix is not used? #13

Open ratthachat opened 2 years ago

ratthachat commented 2 years ago

Hi, thanks for the great work!! I have a question regarding the following point in the paper:

On p.7 it is stated that:

we unfortunately do not know the actual alignment between points in $Y_l$ and $P_l$ , for every $l ∈ {1, 2}$. This can be recovered using an additional optimal transport loss

However, in the code here : https://github.com/octavian-ganea/equidock_public/blob/main/src/train.py#L128 The optimal transport matrix (the 2nd returned variable) is ignored:

ot_dist, _ = compute_ot_emd(cost_mat_ligand + cost_mat_receptor, args['device'])

In my understanding, the matrix should be used to recovered the alignment. So I am now confused how the points alignment can be recovered without this optimal transport matrix?

Thank you so much again!

ratthachat commented 2 years ago

@HannesStark I know you are not the author of this paper, but our team is planning to read your EquiBind work so that we have to understand EquiDock first.

Considering the sad news of Octavian, we are not sure who can answer this question, could you please help us to clarify this point?

HannesStark commented 2 years ago

Hi! The OT matrix was used in computing the returned ot_dist. The model then receives the ot_dist as additional loss and is "encouraged" to decrease it by producing keypoints that closely match the pocket points.

ratthachat commented 2 years ago

Hi Hannes, thanks so much for your kind & quick response! Please allow me to discuss this point further.

As far as our team understand, by using ot_dist as loss only, the keypoints are encouraged to match pocket points as a set, but no one-to-one correspondence between each set. (except if we use ot_matrix explicitly to specify one-to-one correspondence)

Since Kabsch algorithm seems to assume perfect-aligned points (i.e. one-to-one correspondence between two sets), I still not quite clear that ot_loss alone is enough for Kabsch ?

HannesStark commented 2 years ago

Hey!

I am not sure how a one-to-tone correspondence between Keypoints and pocketpoints could be possible considering that the sets have different cardinality. This is the reason why the ot formulation is used.
The ot loss indeed encourages keypoints to be similar to pocket points.
The kabsch algorithm is used to align protein keypoints with ligand keypoints. These are of the same cardinality, but their one-to-one correspondence is not enforced. There is only a soft correspondence between the keypoints coming from the ot_loss that is taken to the same set of pocket points.