Closed danielhanchen closed 7 months ago
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
View this failed invocation of the CLA check for more information.
For the most up to date status, view the checks section at the bottom of the pull request.
Thanks for the contribution!
To clarify, have you included changes for RoPE embedding dtype issue that is mentioned above? I only see GeLU fix.
@pengchongjin Oh actually now that I'm reading the repo's code, I forgot about the RoPE embeddings part - I'm assuming using torch.autocast
will also lose accuracy during finetuning, but will be fine during normal operations. Although I haven't tested it sadly :(
OK, thanks, let's check in the GeLU fix first.
Just a few more Gemma fixes :) Currently checking for more as well! Related PR: https://github.com/huggingface/transformers/pull/29285, which showed RoPE must be done in float32 and not float16, causing positional encodings to lose accuracy.
Will scour for more and will add them here :) (Hopefully the gelu issue is the only one!!)