Confusion about model hyperparameters

Hi, Thank you for your exellent work!

When I looked at the training code (train.py), you used the AdamW optimizer and specified the betas parameter as (0.9, 0.98). I wonder what is your reason for choosing this parameter? Have you tried other parameters?

I have also been training a Mamba-based model recently and your work has inspired me a lot. Would you mind sharing your opinions?

programmablebio / ptm-mamba

Confusion about model hyperparameters #3