Hi,
Thank you very much for your brilliant work on Adan!
And from you paper, it said Adan should get a lower loss (both Train and test) than Adamw according to Figure 1. However, I got a higher training loss with Adan than AdamW in ViT-H:
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">
Hi, Thank you very much for your brilliant work on Adan! And from you paper, it said Adan should get a lower loss (both Train and test) than Adamw according to Figure 1. However, I got a higher training loss with Adan than AdamW in ViT-H: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
Steps | Adamw_train_loss | Adan_train_loss -- | -- | -- 200 | 6.9077 | 6.9077 400 | 6.9074 | 6.9075 600 | 6.9068 | 6.9073 800 | 6.9061 | 6.907 1000 | 6.905 | 6.9064 1200 | 6.9036 | 6.9056 1400 | 6.9014 | 6.9044 1600 | 6.899 | 6.9028 1800 | 6.8953 | 6.9003 2000 | 6.8911 | 6.8971 2200 | 6.8848 | 6.8929 2400 | 6.8789 | 6.8893 2600 | 6.8699 | 6.8843 2800 | 6.8626 | 6.8805 3000 | 6.8528 | 6.8744 3200 | 6.8402 | 6.868 3400 | 6.8293 | 6.862 3600 | 6.8172 | 6.8547 3800 | 6.7989 | 6.8465 4000 | 6.7913 | 6.8405