Always run the rotary embedding layer in float32

Follow-up for #1497

This PR refactors the keras_nlp.layers.modelling.rotary_embedding.RotaryEmbedding layer to always compute in float32 dtype since there are significant precision losses in other dtypes. Also update Gemma to use this layer instead of implementing its own version of RoPE.

This PR isn't ready yet. TODO:

[x] Make sure the models (Gemma/Mistral) generates the same output with the presets.
[x] Make sure the presets run in around 16GB RAM with bfloat16.
[ ] ~Add tests for the RoataryEmbedding layer to check no precision is lost with float16, bfloat16 dtypes.~

Colab showing the equivalence of Gemma's embedding and the rotary embedding in KerasNLP: https://colab.research.google.com/drive/1BNNlxN7Y7yAzJl0UeWdG9TZ6RpfJjCBS?usp=sharing

keras-team / keras-nlp

Always run the rotary embedding layer in float32 #1508