google / gemma_pytorch

The official PyTorch implementation of Google's Gemma models
https://ai.google.dev/gemma
Apache License 2.0
5.26k stars 503 forks source link

Gemma fixes - gelu #37

Closed danielhanchen closed 7 months ago

danielhanchen commented 7 months ago

Just a few more Gemma fixes :) Currently checking for more as well! Related PR: https://github.com/huggingface/transformers/pull/29285, which showed RoPE must be done in float32 and not float16, causing positional encodings to lose accuracy.

  1. Activation function according to https://twitter.com/danielhanchen/status/1763613620909580505 (waiting for confirmation) should be approximate gelu and not exact gelu. Ie gelu should actually be gelu_pytorch_tanh which calls PytorchGELUTanh and that calls nn.functional.gelu(input, approximate="tanh"). gelu calls nn.functional.gelu(input, approximate="none")

Will scour for more and will add them here :) (Hopefully the gelu issue is the only one!!)

google-cla[bot] commented 7 months ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

pengchongjin commented 7 months ago

Thanks for the contribution!

To clarify, have you included changes for RoPE embedding dtype issue that is mentioned above? I only see GeLU fix.

danielhanchen commented 7 months ago

@pengchongjin Oh actually now that I'm reading the repo's code, I forgot about the RoPE embeddings part - I'm assuming using torch.autocast will also lose accuracy during finetuning, but will be fine during normal operations. Although I haven't tested it sadly :(

pengchongjin commented 7 months ago

OK, thanks, let's check in the GeLU fix first.