jasonkyuyim / se3_diffusion

Implementation for SE(3) diffusion model with application to protein backbone generation
https://arxiv.org/abs/2302.02277
MIT License
320 stars 51 forks source link

trans_x0_threshold=1.0 ? #38

Open jiaweiguan opened 6 months ago

jiaweiguan commented 6 months ago

Hi! When running the training code, I noticed that trans_score_loss is always 0. Because trans_x0_threshold is set to 1.0. What is the purpose of this setting? https://github.com/jasonkyuyim/se3_diffusion/blob/53359d71cfabc819ffaa571abd2cef736c871a5d/experiments/train_se3_diffusion.py#L568

trans_loss = (
              trans_score_loss * (batch['t'] > self._exp_conf.trans_x0_threshold) # 1.0
              + trans_x0_loss * (batch['t'] <= self._exp_conf.trans_x0_threshold)
)
jasonkyuyim commented 6 months ago

Hi, trans_x0_loss is the loss we use, corresponding to the first equation in section 4.2 of the paper. Note L2 on the true "denoised" positions is equivalent to the score loss due to gaussianization. I think trans_x0_threshold is an artifact of some early experiments we were doing but can be removed now.

jiaweiguan commented 6 months ago

During the training process, I observed a phenomenon where the change in trans_x0_loss is related to the length of the protein, as shown in the graph. Do you know the reason behind this? Could you please explain it? image

jasonkyuyim commented 6 months ago

Well this should make sense. RMSD is sensitive to the length of the protein. Bigger proteins will tend to have larger errors.

jiaweiguan commented 6 months ago

Thanks! Is it necessary to eliminate this part of the impact? I am currently doubtful about at what stage to stop training the model, as the reference from the validation data is limited. Is there any approximate quantitative relationship between the dataset size and the number of training steps?

jasonkyuyim commented 6 months ago

I haven't thoroughly studied any scaling. My latest code release frame flow has better metrics that track designability. But other than that you'll have to run evaluations from time to time.

jiaweiguan commented 5 months ago

Thank you! I have also looked into the work of FrameFlow and its diversity, novelty, and designability. However, I have noticed that there seems to be a preference for spiral structures, which may be dataset-dependent. The selection of a generation model is quite challenging.

jasonkyuyim commented 5 months ago

Yes it's very dataset-dependent. That said, helical structures are the most prevalent structures in all the diffusion/flow models (including chroma and rfdiffusion) nowadays.

jiaweiguan commented 5 months ago

Thank you for your response. I have another hypothesis that the helical structure is easier to learn, while the beta strand is more challenging. Is there any research that confirms this?

jiaweiguan commented 5 months ago

This is the model I trained, and sometimes the strand percent is close to 0, and sometimes it's close to 0.2. It gives me the feeling that it's not stable enough. image

jasonkyuyim commented 5 months ago

I have another hypothesis that the helical structure is easier to learn, while the beta strand is more challenging. Is there any research that confirms this?

We only have empirical evidence found through other protein diffusion models like Chroma.

jiaweiguan commented 5 months ago

Emmmm... I performed sampling tests using the "best_weights.pth" parameters and observed that when the sampling length (L) is particularly large, such as L=1024, the sampled results consistently exhibit helix structure. I'm unsure whether this issue is related to the dataset or limited generalization capability with respect to L.

gsakellion commented 2 months ago

@jiaweiguan , I am currently in the state of trying to run the model (not managing yet), mainly to see whether from the "paper_weights.pth", which were trained on sequences of up to 512 monomers, I can get structures of greater length. The accuracy of any structure is inconsequential at this state. From my understanding, it is possible to get longer chains. But is it the case? It seems like you managed, but are the "best_weights.pth" trained on the same sequence length (L.E. 512)?

jasonkyuyim commented 2 months ago

Since the model was only trained up to length 512, one would not expect the model to perform well on unseen lengths such as 1024. You would have to change how the model is trained, i.e. with relative encodings or crops, to get good samples at longer lengths.