mosaicml / composer

Supercharge Your Model Training
http://docs.mosaicml.com
Apache License 2.0
5.08k stars 408 forks source link

InContextLearning*Dataset Default padding sides hardcoded? #2778

Open MFajcik opened 7 months ago

MFajcik commented 7 months ago

Hi, I was wondering regarding your code here. https://github.com/mosaicml/composer/blob/a7cad7c221ce8ad9697bde50db0b3f37f8b8025e/composer/datasets/in_context_learning_evaluation.py#L655

Why do you assume right padding (for InContextLearningMultipleChoiceTaskDataset problem, but also some others)?

  1. Shouldn't the padding_side be derived from the tokenizer?
  2. Assuming right padding breaks some models (Mistral is unusable).

Thanks for information.

dakinggg commented 6 months ago

Hey @MFajcik, sorry for the delayed response. tokenizers have a default padding side set, but models should all be compatible with different padding sides (unless they explicitly error out). Generally speaking, we use right padding by default (for training, single forward passes, etc), and left padding for generation (necessary for the auto regressive generation and kv cache to work out). Mistral should work fine (we've run it). You may need to update to the latest transformers version. If you have an issue there, please send a full repro. Thanks!