gante commented 2 months ago

🧹 This is a tracker regarding the move of prepare_inputs_for_generation into the generation mixin 🧹

Why?

prepare_inputs_for_generation is not part of the core modeling, but rather a utility for generate
it should greatly reduce the need to touch modeling code, on generate changes. Fewer modeling changes -> improved model stability
greatly reduced number of lines of code 🙏

Tracker

Kinda ordered list of tasks:

[x] 1. Fix related slow tests before we start — all llama, generate, and cache_utils [except sink cache, broken atm] slow tests should be passing to ensure we don’t break anything (#33138)
[x] 2. PreTrainedModel doesn't inherit from GenerationMixin, so that can_generate() becomes independent of prepare_inputs_for_generation being overwritten or not (#33203)
[x] 3. Move llama’s prepare_inputs_for_generation to the generation mixin. This implies moving one function that prepares the 4D mask too (the one that is called there) (#33677)
[x] 4. Add tests for the generalist prepare_inputs_for_generation — currently we don’t test it directly, and we should (decoder-only llms: https://github.com/huggingface/transformers/pull/33870, encoder-decoder llms: https://github.com/huggingface/transformers/pull/34048)
[x] 5. Address the case of synced_gpus in generate: when synced_gpus and cache_positions is out of bounds, take the latest available input_ids for dummy computations (https://github.com/huggingface/transformers/pull/34095)
[x] 6. Delete prepare_inputs_for_generation from as many models as possible. There may be merge conflicts here, due to the 4D mask function. Try to iron out as many trivial cases as possible (decoder-only llms: https://github.com/huggingface/transformers/pull/33870, encoder-decoder llms: https://github.com/huggingface/transformers/pull/34048)
[x] 7. Change prepare_inputs_for_generation to forward **kwargs from its input to its output. With minimal changes, this should enable most VLMs to use the shared function -- they forward pixel_values from the input to the output (support for **kwargs https://github.com/huggingface/transformers/pull/33870)
[x] 8. By this point most cases of prepare_inputs_for_generation should have been removed 🤗 We would need to check the others individually, there may be further simplification patterns available!

gante commented 2 months ago

@ydshieh edit the tracker above as soon as you start working on a task, so we don't risk doing redundant work 🤗 (e.g. with the link to a draft PR)

I'll do the same!

ydshieh commented 2 months ago

Thanks

huggingface / transformers

tracker: move `prepare_inputs_for_generation` into the generation mixin 🧹 #32685

Why?

Tracker