Configuration Adjustment for Flash Attention and Grid-like View during Training

hwjiang1510 / LEAP

[ICLR 2024] Code for LEAP: Liberate Sparse-view 3D Modeling from Camera Poses

172 stars 6 forks source link

Hello, and thank you for your great work!

I have a question regarding the configuration when enabling Flash Attention. Specifically, should settings like the learning rate or batch size in the config file be adjusted when Flash Attention is used?

Additionally, I have noticed that when Flash Attention is enabled, I occasionally observe a grid-like pattern during the training phase. I am curious if you might know the reason for this phenomenon. I have seen this grid-like view appear under different parameter configurations as well.

I appreciate your insights and look forward to your response!

hwjiang1510 / LEAP

Configuration Adjustment for Flash Attention and Grid-like View during Training #15