hwjiang1510 / LEAP

[ICLR 2024] Code for LEAP: Liberate Sparse-view 3D Modeling from Camera Poses
172 stars 6 forks source link

Configuration Adjustment for Flash Attention and Grid-like View during Training #15

Open ivorelectra opened 1 month ago

ivorelectra commented 1 month ago

Hello, and thank you for your great work!

I have a question regarding the configuration when enabling Flash Attention. Specifically, should settings like the learning rate or batch size in the config file be adjusted when Flash Attention is used?

Additionally, I have noticed that when Flash Attention is enabled, I occasionally observe a grid-like pattern during the training phase. I am curious if you might know the reason for this phenomenon. I have seen this grid-like view appear under different parameter configurations as well.

I appreciate your insights and look forward to your response!

hwjiang1510 commented 1 month ago

Hi,

Thanks for your interest in our work.

I tested Flash Attention with the same training setting of normal attention. And I do observe the performance drop with flash attention.

I would recommend to use the latest version of pytorch, where flash attention is ensembled in the attention layer.