Open jiaweiguan opened 7 months ago
Hi, trans_x0_loss
is the loss we use, corresponding to the first equation in section 4.2 of the paper. Note L2 on the true "denoised" positions is equivalent to the score loss due to gaussianization. I think trans_x0_threshold
is an artifact of some early experiments we were doing but can be removed now.
During the training process, I observed a phenomenon where the change in trans_x0_loss
is related to the length of the protein, as shown in the graph. Do you know the reason behind this? Could you please explain it?
Well this should make sense. RMSD is sensitive to the length of the protein. Bigger proteins will tend to have larger errors.
Thanks! Is it necessary to eliminate this part of the impact? I am currently doubtful about at what stage to stop training the model, as the reference from the validation data is limited. Is there any approximate quantitative relationship between the dataset size and the number of training steps?
I haven't thoroughly studied any scaling. My latest code release frame flow has better metrics that track designability. But other than that you'll have to run evaluations from time to time.
Thank you! I have also looked into the work of FrameFlow and its diversity, novelty, and designability. However, I have noticed that there seems to be a preference for spiral structures, which may be dataset-dependent. The selection of a generation model is quite challenging.
Yes it's very dataset-dependent. That said, helical structures are the most prevalent structures in all the diffusion/flow models (including chroma and rfdiffusion) nowadays.
Thank you for your response. I have another hypothesis that the helical structure is easier to learn, while the beta strand is more challenging. Is there any research that confirms this?
This is the model I trained, and sometimes the strand percent is close to 0, and sometimes it's close to 0.2. It gives me the feeling that it's not stable enough.
I have another hypothesis that the helical structure is easier to learn, while the beta strand is more challenging. Is there any research that confirms this?
We only have empirical evidence found through other protein diffusion models like Chroma.
Emmmm... I performed sampling tests using the "best_weights.pth" parameters and observed that when the sampling length (L) is particularly large, such as L=1024, the sampled results consistently exhibit helix structure. I'm unsure whether this issue is related to the dataset or limited generalization capability with respect to L.
@jiaweiguan , I am currently in the state of trying to run the model (not managing yet), mainly to see whether from the "paper_weights.pth", which were trained on sequences of up to 512 monomers, I can get structures of greater length. The accuracy of any structure is inconsequential at this state. From my understanding, it is possible to get longer chains. But is it the case? It seems like you managed, but are the "best_weights.pth" trained on the same sequence length (L.E. 512)?
Since the model was only trained up to length 512, one would not expect the model to perform well on unseen lengths such as 1024. You would have to change how the model is trained, i.e. with relative encodings or crops, to get good samples at longer lengths.
Hi! When running the training code, I noticed that
trans_score_loss
is always0
. Becausetrans_x0_threshold
is set to1.0
. What is the purpose of this setting? https://github.com/jasonkyuyim/se3_diffusion/blob/53359d71cfabc819ffaa571abd2cef736c871a5d/experiments/train_se3_diffusion.py#L568