Whether SE3 needs pre-training

lucidrains / se3-transformer-pytorch

Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch. This specific repository is geared towards integration with eventual Alphafold2 replication.

MIT License

262 stars 23 forks source link

Whether SE3 needs pre-training #9

Open zyk19981118 opened 3 years ago

zyk19981118 commented 3 years ago

Thank you for your work. I used your reproduced SE3 as a part of my model, but the current test effect is not very good. I guess it may be because I do not have a good understanding of your model. Here are my questions:

Does your model need pre-training?
Can I train SE3 Transformer with the full connection layer that comes after it? Good advice is also welcome

MattMcPartlon commented 3 years ago

I've found that pre-training helps (100 batches, linear weight scale from 1e-6 up to 1e-4). I've also found that smaller depth (2 or 3) works better than larger depth (>3).
I'm not sure what you mean here. The fully connected layer that acts on type-1 features (i.e. 3d-coordinates) in the attention block? Or the linear projection that projects the final output form the dx3 to 1x3 (i.e. projection from the hidden dimension to output dimension).

MattMcPartlon commented 3 years ago

Either way, both of these are equivariant operations, so you can train with or without them. I recommend keeping them as-is.