Closed tGhattas closed 1 month ago
Just came here to open this issue too, thanks!
Thanks for bringing that up. We'll work on releasing a sample code for these stages. Stay tuned.
We updated the README and added some sample scripts for Stage 1 and 2 implementation. Please let us know if there's anything you would like clarified!
Hey @kevinli573 ! thanks for the update. Just to be clear on the example scripts, I don't see any optimizer steps, meaning the model weights aren't being updated, was this intentional?
Yes, the python files aren't supposed to be a standalone script but more of a guideline for how we calculated the loss for Stage 1 and 2; hence, it missing optimizer, scheduler, etc.
You can find more information about the optimizer (AdamW), scheduler (WSD), and their hyperparameters we use in our paper.
Hey guys, awesome work first of all. I was wondering if you are planning to release the training source code, specifically stage 1 & 2 implementation?
Thanks