scaomath / galerkin-transformer

[NeurIPS 2021] Galerkin Transformer: a linear attention without softmax for Partial Differential Equations
MIT License
214 stars 28 forks source link

A numerical experiment problem of darcy flow experiment. #3

Closed cesare4444 closed 2 years ago

cesare4444 commented 2 years ago

Hi Shuhao. Great of your work. I am running your ex2_darcy.py, the L2 loss is about 0.00914. I tried many times and the result is similar, never reach 0.00847 shown in your paper. The fine resolution is 211 and coarse resolution is 61. Is it normal?

Thanks.

scaomath commented 2 years ago

😂 thanks for the issue and this is totally possible.

The result reported in the paper was obtained circa mid-May (for example the eval notebook here: https://github.com/scaomath/galerkin-transformer/blob/main/eval/ex2_darcy_eval.ipynb which has eval relative L2 0.00855), and it turned out I never got chance to double check over 50 branches in my dev repo after that (teaching two class back in May 2021, the time near the final was crazy). Note due the nn.functional.interpolate is used, there is some randomness in results (I plan to re-implement using finite element here https://github.com/scaomath/torch-fem)

After June, the biggest change is that I found a careless bug fixed in 42f17f10dd9c47885113cdaf55079e24722cb17a , and it possibly attributes to the degradation of the benchmark. The bug essentially said that the evaluation is sensitive to the eps hyperparam used in normalization (including layer norm) because truncation error stability in FP32 is an issue for these tasks.

I will clean the repo during the winter break and update with you the results.

cesare4444 commented 2 years ago

Thank you for your response. Your method improved a lot in Burger's equation. I am looking forward to your result in the Darcy flow experiment.