schrojunzhang / KarmaDock

https://www.nature.com/articles/s43588-023-00511-5
Apache License 2.0
92 stars 12 forks source link

模型权重问题 #1

Closed HBioquant closed 1 year ago

HBioquant commented 1 year ago

你好,对你们的对接程序很感兴趣,谢谢你发展了KarmaDock。看你们昨天发布的文章。比较好奇为什么你们不用equibind提供的time split划分做下模型训练来进行公平的比较呢?我清楚time split划分没意义,但是文章里提到不公平比较然后使用tankbind这一点似乎有点牵强了吧。是比不过他们,所以没放数据吗?你们的思路我以前实现过,就是用DeepDock那套打分以及构象优化(构象优化换成过LigPose和Equibind那类),在equibind的test set的对接pose的RMSD并不是很理想(当然,这也取决于搭建的模型)。麻烦请问有time split划分下的模型权重吗?我期待对你们的模型进行更系统地评估对接和筛选能力。感谢你的工作!很棒!

schrojunzhang commented 1 year ago

Hi HBioquant,

Thank you for your appreciation and interest in our work. We are pleased to respond to the points you raised.

  1. Regarding your point about a fair comparison, we indeed have reported a fair comparison in the paper, and we are more than happy to highlight it again. image Our models were trained under three different data divisions. In particular, the fair comparison was conducted under the same data division protocol according to the TankBind split. Additionally, we even provided the actual protein pockets to TankBind to simulate the situations where the pockets are known. (Since in the original paper, TankBind was run on the pocket predicted by P2Rank.)

  2. Concerning your mention of implementing a similar approach with less satisfactory results, we agree with your assumption that it may have something to do with the architecture design. However, we have no information about your code, please show us your specific code if possible. Additionally, we must contend that apart from rational architecture, a more crucial factor contributing to KarmaDock's superior performance is the inductive bias of optimal inter-node distance distribution introduced by the MDN module, which enhanced the success rate from 25% to 56% under the EquiBind time split. We emphasized this in the paper as well: image

  3. We agree with your statement that using a time split for developing virtual screening tools may not be very meaningful. Hence, when we restructured the KarmaDock code, we also sorted the model weights obtained under the PDBBind division. However, we didn’t organize the weights obtained under the time split division. If there is a substantial request for it, we are open to organizing and providing it.

Let us know if you have any further questions about the original reports in Karmadock!

Best regards, Xujun

Ahafigure commented 1 year ago

Hi Xujun,

I appreciate your excellent work. I have a few questions:

  1. I couldn't find an analysis of dataset redundancy in the article, such as protein sequence similarity or small molecule similarity. Does this significantly affect the results?
  2. Can I interpret this method more as an optimization approach? Is it similar to providing an initial position (the center of the pocket) and then rigidly optimizing the structure of the small molecule generated by RDKit?
  3. Another question is that in practical situations, we cannot obtain pockets and crystal structures with such precision for molecular docking tasks. Have you considered the impact of these factors on the results? For example, most binding site predictions are done at the amino acid level. Did you use the nearest amino acids to the molecule as the center of the binding pocket in the test set, or did you use the SMILES representation of molecules as input for docking in the test set? Also, is the 12 angstrom pocket size too small? After all, molecules can explore various conformational spaces during docking.

I'm not a reviewer; I'm simply very curious about how these issues might affect the model because I haven't come across related assessments so far. Nevertheless, these are indeed real-world challenges.

Thank you.

schrojunzhang commented 1 year ago

Hi Xujun,

I appreciate your excellent work. I have a few questions:

  1. I couldn't find an analysis of dataset redundancy in the article, such as protein sequence similarity or small molecule similarity. Does this significantly affect the results?
  2. Can I interpret this method more as an optimization approach? Is it similar to providing an initial position (the center of the pocket) and then rigidly optimizing the structure of the small molecule generated by RDKit?
  3. Another question is that in practical situations, we cannot obtain pockets and crystal structures with such precision for molecular docking tasks. Have you considered the impact of these factors on the results? For example, most binding site predictions are done at the amino acid level. Did you use the nearest amino acids to the molecule as the center of the binding pocket in the test set, or did you use the SMILES representation of molecules as input for docking in the test set? Also, is the 12 angstrom pocket size too small? After all, molecules can explore various conformational spaces during docking.

I'm not a reviewer; I'm simply very curious about how these issues might affect the model because I haven't come across related assessments so far. Nevertheless, these are indeed real-world challenges.

Thank you.

Hi Ahafigure ,

Thank you for your insightful questions. Let me address them one by one:

  1. We reported the protein similarity between the training set and test set in Supplementary Information under Supplementary Table 3. You can find the image for reference at this link. The protein similarity under MLSF_Split is the highest. It's been pointed out in several studies that protein family similarities can lead to overly optimistic model performance evaluations. However, for virtual screening purposes, the more protein families the model is exposed to, the better it generalizes across different proteins. Hence, we continued with the MLSF_Split in subsequent experiments.

  2. Your interpretation can be seen in various ways. Traditional molecular docking algorithms also search for the binding conformation of the molecule and protein based on an initial molecule conformation. They use search algorithms and scoring functions, not deep learning. If you regard that process as an optimization algorithm, then you could see KarmaDock in a similar light. However, even though KarmaDock starts with a conformation generated by RDKit, the model still updates coordinates atom-wise. Using RDKit just provides a favorable initial conformational distribution.

  3. If the binding site is unknown, one might need computational/experimental methods to obtain the pocket coordinates. Upon identifying the pocket's location, traditional docking methods, like AutoDock Vina, could be used to dock a molecule and determine the binding pocket. As for the missing crystal structures you mentioned, I'm unclear whether you're referring to molecules or proteins. If it's the proteins, one could use tools like AlphaFold 2, RosettaFold, or ESM Fold to predict protein structures. We've also considered the possibility of predicted protein structures being in the APO form, and subsequently tested KarmaDock's performance on APObind. If it's molecules you're referring to, RDKit can handle this. Regarding the binding site perturbation, we haven't explicitly tested for that. In our virtual screening for LTK, we didn't have the exact binding site. We didn't utilize the SMILES representation of molecules in this process. A 12-angstrom pocket size isn't small; the pocket consists of amino acids within 12Å of each ligand atom. Comparatively, the protein-ligand scoring domain generally uses 8 or 10 angstroms, making 12Å fairly generous.

I hope this addresses your queries. If you have further questions, please let me know.

Best regards,

Xujun

HBioquant commented 1 year ago
  1. As far as I know, you used DiffDock's results on Equibind split test set to compare with your CASF2016 results, and you claim KarmaDock is the SOTA method over DiffDock and other blind docking methods that do not require pocket prioir. Is this a fair and reasonable comparison?Have you tested DiffDock and E3Bind (a similar method as KarmaDock's implementation, such as model archectecture although you additionally referred to Ligpose and DeepDock's MDN) on CASF2016?
  2. ”However, for virtual screening purposes, the more protein families the model is exposed to, the better it generalizes across different proteins“. A interesting excuse and surprised me. hopefully this is not the start of overfitting docking model in the field... As in your lab's VS assessment work, CrossDock and other papers have shown that high sequence similarity tends to produce optimistic performance, especially for pockets. A more careful docking benchmarking should be made for deep learning methods comparison.
  3. As you code implamentation, I found you use crystal ligand center as pocket center and apply N(0, 4 A) gaussian perturbation and rotation to a rdkit-generated pose to make fake pose. Such training may be helpful in optimizing the pose from conventional methods, but it seems to be responsible for such optimistic results compared with others without any crystal ligand prioir. You should use the center of the pocket (as LigPose's implementation), which is more realistic.
  4. Based on our experience with pose generation, we doubt your model performance and expect you to provide training scripts that will allow us to better reproduce your model. That's all. Thanks for your excellent work!
schrojunzhang commented 1 year ago
  1. As far as I know, you used DiffDock's results on Equibind split test set to compare with your CASF2016 results, and you claim KarmaDock is the SOTA method over DiffDock and other blind docking methods that do not require pocket prioir. Is this a fair and reasonable comparison?Have you tested DiffDock and E3Bind (a similar method as KarmaDock's implementation, such as model archectecture although you additionally referred to Ligpose and DeepDock's MDN) on CASF2016?
  2. ”However, for virtual screening purposes, the more protein families the model is exposed to, the better it generalizes across different proteins“. A interesting excuse and surprised me. hopefully this is not the start of overfitting docking model in the field... As in your lab's VS assessment work, CrossDock and other papers have shown that high sequence similarity tends to produce optimistic performance, especially for pockets. A more careful docking benchmarking should be made for deep learning methods comparison.
  3. As you code implamentation, I found you use crystal ligand center as pocket center and apply N(0, 4 A) gaussian perturbation and rotation to a rdkit-generated pose to make fake pose. Such training may be helpful in optimizing the pose from conventional methods, but it seems to be responsible for such optimistic results compared with others without any crystal ligand prioir. You should use the center of the pocket (as LigPose's implementation), which is more realistic.
  4. Based on our experience with pose generation, we doubt your model performance and expect you to provide training scripts that will allow us to better reproduce your model. That's all. Thanks for your excellent work!

Hi HBioquant,

Thank you for your attention to KarmaDock.

  1. We sincerely ask you to thoroughly review our paper to grasp the key points. We have comprehensively discussed the docking problem and done the corresponding experiments as much as possible. At this point, we believe this contribution is solid enough, at least following the conventions of this field.

  2. Our experimental design strictly adheres to the standards commonly followed in the industry for algorithm evaluation. We recommend referring to several related works to appreciate the common practices in the field. As you pointed out some issues, it's essential to note that while certain aspects of this domain may seem "dirty", adhering to conventions is critical. Deviating from these norms would risk the dismissal of every model as irrelevant or ineffective.

  3. Of note, another major contribution is that KarmaDock has been tested in real-world screening scenarios and has successfully proved its effectiveness, which is another solid evidence of the performance of KarmaDock.

  4. We would like to see your contribution to the development of docking methods and look forward to seeing the problems you mentioned can be solved in your paper. Since there is a gap between the standards you curated and what we have grasped, it is better for you to bring them to reality.

Once again, thank you for your attention to KarmaDock. We appreciate the community's engagement in advancing the field.

Warm regards, Xujun