keras-team / keras-hub

Pretrained model hub for Keras 3
Apache License 2.0
803 stars 243 forks source link

Limit the default sequence length to 1024 for all models #1770

Closed mattdangerw closed 3 months ago

mattdangerw commented 3 months ago

Pretrained models are supporting larger and larger sequence lengths. In gemma's case this is a particular nasty gotcha, as very few people have the compute resources to actually train on 8000 token long sequences.

I think it might be the more user friendly approach to lower our default sequence length to 1024. This won't prohibit users from setting a longer sequence length, but it will lessen the unpleasant gotcha of suddenly using a ton of VRAM when training.

We should still almost always document setting the sequence length in our code examples, as it's something the user generally should think about when fine-tuning or generating.

mattdangerw commented 3 months ago

Let's try it out. Something tells me we aren't done with discussions here, but hopefully this is a positive delta.