Limit the default sequence length to 1024 for all models

Pretrained models are supporting larger and larger sequence lengths. In gemma's case this is a particular nasty gotcha, as very few people have the compute resources to actually train on 8000 token long sequences.

I think it might be the more user friendly approach to lower our default sequence length to 1024. This won't prohibit users from setting a longer sequence length, but it will lessen the unpleasant gotcha of suddenly using a ton of VRAM when training.

We should still almost always document setting the sequence length in our code examples, as it's something the user generally should think about when fine-tuning or generating.

keras-team / keras-hub

Limit the default sequence length to 1024 for all models #1770