VN-EGNN generates different results for the same input?

JavierSanchez-Utges commented 5 months ago

I ran VN-EGNN multiple times for the same input protein structure and I realised that the number of predicted pockets differ. Is this usual? I would expect as the model is already trained; it should give the same predictions when running it on the same input.

Here, I attach the files for reference.

Thanks! 2l4k_A.clean.txt (Change extension to .pdb) 2l4k_A_prediction (1).csv 2l4k_A_prediction (2).csv 2l4k_A_prediction.csv

fses91 commented 5 months ago

Hi JavierSanchez, yes the results for multiple runs can differ slightly, because the positions of the virtual nodes are initialized randomly. If you fix the seed for consecutive runs, the results should be the same.

We analyzed this behavior in Appendix G of our paper. Best regards

JavierSanchez-Utges commented 5 months ago

Hi, Florian,

Thanks for the quick response. I missed that. Could you point me out to where exactly in the code I have to set the seed for the results to be reproducible? Is it somewhere in predict.py?

On another note, VN-EGNN is a pocket-centric method, as what it predicts explicitly is a pocket centroid, right? However, in annex B, you mention an implicit definition of the pocket can be obtained by assigning binary [0, 1] labels to protein atoms according to a given distance threshold from the predicted pocket centroid. Which threshold did you use for this, if any? Perhaps 5Å? Is there a function in the repository that does this already?

Finally, as the method does not predict pocket residues, there is not a ligandability score per se per residue, as predicted by other methods, e.g., P2Rank, IF-SitePred, GrASP, is this correct?

Many thanks for your help,

Javier

De: Florian Sestak @.> Enviado: miércoles, 17 de abril de 2024 7:42 Para: ml-jku/vnegnn @.> Cc: Javier Sánchez Utgés @.>; Author @.> Asunto: Re: [ml-jku/vnegnn] VN-EGNN generates different results for the same input? (Issue #2)

Hi JavierSanchez, yes the results for multiple runs can differ slightly, because the positions of the virtual nodes are initialized randomly. If you fix the seed for consecutive runs, the results should be the same.

We analyzed this behavior in Appendix G of our paper. Best regards

— Reply to this email directly, view it on GitHubhttps://github.com/ml-jku/vnegnn/issues/2#issuecomment-2060417002, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALH5T4IHKJL7BIGIOVAXQF3Y5YDURAVCNFSM6AAAAABGJ4CCQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRQGQYTOMBQGI. You are receiving this because you authored the thread.

fses91 commented 5 months ago

Hi Javier,

the code where the starting positions are sampled is in the dataloder: https://github.com/ml-jku/vnegnn/blob/master/src/datasets/binding_dataset.py#L104 https://github.com/ml-jku/vnegnn/blob/master/src/utils/graph.py#L368 So you should be fine if you seed PyTorch before you sample

Exactly, VNEGNN is a pocket-centric method, it predicts only the pocket center. The implicit definition you are referring to, is how the ground truth data is created, as common in literature, you take every atom surrounding the ligand under a certain threshold, label these atoms as part of the binding site, and your pocket is the geometric center of these positively labeled atoms.

We do only predict ligandability score (confidence score) for the pocket itself, not for the individual residues.

Best regards, Florian

JavierSanchez-Utges commented 5 months ago

Hi, Florian,

Thanks for the response. That answers all my questions.

Best wishes,

Javier

De: Florian Sestak @.> Enviado: jueves, 18 de abril de 2024 8:06 Para: ml-jku/vnegnn @.> Cc: Javier Sánchez Utgés @.>; Author @.> Asunto: Re: [ml-jku/vnegnn] VN-EGNN generates different results for the same input? (Issue #2)

Hi Javier,

the code where the starting positions are sampled is in the dataloder: https://github.com/ml-jku/vnegnn/blob/master/src/datasets/binding_dataset.py#L104 https://github.com/ml-jku/vnegnn/blob/master/src/utils/graph.py#L368 So you should be fine if you seed PyTorch before you sample

Exactly, VNEGNN is a pocket-centric method, it predicts only the pocket center. The implicit definition you are referring to, is how the ground truth data is created, as common in literature, you take every atom surrounding the ligand under a certain threshold, label these atoms as part of the binding site, and your pocket is the geometric center of these positively labeled atoms.

We do only predict ligandability score (confidence score) for the pocket itself, not for the individual residues.

Best regards, Florian

— Reply to this email directly, view it on GitHubhttps://github.com/ml-jku/vnegnn/issues/2#issuecomment-2063069963, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALH5T4MCMDYHOG4WIY5XNMLY55PIFAVCNFSM6AAAAABGJ4CCQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRTGA3DSOJWGM. You are receiving this because you authored the thread.Message ID: @.***>

ml-jku / vnegnn

VN-EGNN generates different results for the same input? #2