Closed Wangchentong closed 1 year ago
Hi,
Thanks for the message.
Happy to answer any more questions!
Great! Actually i have two more questions😄
1.About pretraining with structure prediction task :your lab has published a paper to do protein structure prediction task with generative model(EIGENFOLD),it's convinient to cooperate this task into framediff(or use esm2 output rather than single sequence embedding),the power of structure prediction pretraining has been proved by rfdiffusion。have you consider that training a structure prediction model with frame diff as a start point?
2.Have you considered to extend your work to small molecule binder generation?
Besides, “as we've seen that AF2 distillation in RFdiffusion help it tremendously”,is it in RFdiffusion paper? It would be nice if you can explain it a little more😊
Btw I'm extremely busy with end of semester matters. My responses will likely be delayed.
Never mind, really aprreciate your reply! A final question, May I ask for your insight on the benefits that a lightweight model like framdiff(without axial attention and msa transformer ) can achieve compared to RosettaFold in structure prediction pre-training?
I beleve a lightweight model like FrameDiff is better for methods research. It's easier to train and analyze rather than having to go through the RosettaFold pipeline which is not publicly trainable. Obviously larger more complex models will perform the best but uncovering the scaling principles will be done first through lightweight models.
Hi,
Thank you and your team for open-sourcing this impressive work.I want to express my appreciation for your team's great work and for releasing it as open source. I'm really interested in understanding more about your data filtering strategy.
1. I noticed that you only trained the model on monomeric CIF files with max_len < 512. Would it be beneficial to include single chains from multimeric CIF files as well? This approach would significantly increase the dataset size by 5x to 100k proteins, even though their structure may not be true in the monomer state.
2. Have you considered expanding the dataset using representative clustered proteins in the high confidence AFDB? If not, what is your opinion on this approach?
Thank you for your help and I look forward to hearing back from you soon.