jasonkyuyim / multiflow

https://arxiv.org/abs/2402.04997
MIT License
89 stars 5 forks source link

Question about zero designability of codesign #4

Open hzk597955516 opened 3 weeks ago

hzk597955516 commented 3 weeks ago

Hi Jason,

Great work!

I would like to ask why when I train codesign on the native pair training set, the designability is almost 0?

jasonkyuyim commented 2 weeks ago

Hi, can you give more details? Is the PMPNN designability also poor? How long are you training for and do the losses go down?

hzk597955516 commented 2 weeks ago

Hi, can you give more details? Is the PMPNN designability also poor? How long are you training for and do the losses go down?

Hi Thank your reply My initial experiment was trained on your pdb dataset with a maximum length of 128, a minimum length of 0, filtering out loop than 0.5, a batch size of 50, and no redesign. When the number of steps is 100k, the codesign is almost 0, but the PMPNN design is about 0.86, and the loss decreases normally. In a recent experiment, I redesigned all sequences in the dataset and set the minimum length to 60(not changing the maximum length). When training for 100k steps, the codesign increased to about 0.34 and the PMPNN design was about 0.95. In the above situation, does training data quality play a critical role when using flow matching for multi-modal generation tasks?

jasonkyuyim commented 1 week ago

Yes training data quality plays a crucial role for generative models (not just flow matching for multimodal tasks). One could say there's an "alignment" issue between natural proteins and ones we call designable. We found the PDB sequences give poor designability. Could you use our exact training data settings and see if codesign goes up? Seems you are using your own settings?