[`Generate`] Fix `gradient_checkpointing` and `use_cache` bug for generate-compatible models

younesbelkada commented 1 year ago

Feature request

When using a model that uses gradient_checkpointing and if a user wants to call generate with use_cache, it leads some models to bugs, such as the one described in https://github.com/huggingface/transformers/pull/21733

The fix should be to slightly refactor some models following the same procedure as in the aforementioned PR

How to participate

If it is your first time here, have a quick look at our contribution guidelines 🤗
Pick a model from the list below. Check in the comments here if it hasn't been claimed yet.
Claim your models in the comments (e.g. "I want to work on GPT2")
Replicate the changes of this PR to your model of choice. In other words, move the if block to the line above the ... if use_cache else None, in the same .forward() function. Please note that some models may have more than one instance of this block!
Make sure you've run our automated code formatting tool (i.e. run make fixup in your shell -- also run make fix-copies if it requests you to do so)
Open a PR. Tag @younesbelkada or @gante (one of us is enough)

That's it! With each change, you'll be making transformers a little bit better for all of us 💛

Models to fix:

[x] Bart | https://github.com/huggingface/transformers/pull/21866
[x] Bert
[x] BigBird | https://github.com/huggingface/transformers/pull/21882
[x] BigBirdPegasus
[x] BioGPT | https://github.com/huggingface/transformers/pull/21844
[x] Blenderbot
[x] BlenderbotSmall
[x] BlipText
[x] Bloom
[x] CodeGen
[x] Esm
[x] Git | https://github.com/huggingface/transformers/pull/21818
[x] GPT2 | https://github.com/huggingface/transformers/pull/21772
[x] GptNeo | https://github.com/huggingface/transformers/pull/21733
[x] GptNeoX | https://github.com/huggingface/transformers/pull/21815
[x] GPT-J
[x] ImageGPT | https://github.com/huggingface/transformers/pull/21816
[x] LED | https://github.com/huggingface/transformers/pull/21840
[x] LongT5
[x] M2M100 | https://github.com/huggingface/transformers/pull/21841
[x] Marian | https://github.com/huggingface/transformers/pull/21842
[x] MBart | https://github.com/huggingface/transformers/pull/21918
[x] MegratronBert | https://github.com/huggingface/transformers/pull/21921
[x] MVP | https://github.com/huggingface/transformers/pull/21920
[x] OPT
[x] Pegasus
[x] PegasusX
[x] ProphetNet | https://github.com/huggingface/transformers/pull/21772
[x] RemBert
[x] RoFormer
[x] Speech2Text
[x] Speech2Text2
[x] SpeechT5
[x] SwitchTransformer
[x] T5
[x] TimeSeriesTransformer
[x] TrajectoryTransformer
[x] TrOCR
[x] Whisper
[x] XGLM
[x] XLMRobertaXL
[x] Xmod

pmollerus23 commented 1 year ago

@younesbelkada I am a little confused on where the list for generate-compatible models is located. I'd like to pick up this issue if I can find it!

younesbelkada commented 1 year ago

Hello @mollerup23 Thanks for your interest, we will update the list with @gante once #21733 gets merged !

connor-henderson commented 1 year ago

@younesbelkada Looks like it will be essentially the same fix across the other models too. Do you want me to pull that fix into a utility function once merged? Just for illustration, something like -

use_cache = should_use_cache(logger, use_cache, self.gradient_checkpointing, self.training)
presents = () if use_cache else None

and likely in modeling_utils.py -

def should_use_cache(logger: Logger, use_cache: bool, gradient_checkpointing: bool, training: bool) -> bool:
    if use_cache:
        if gradient_checkpointing and training:
            logger.warning(
                "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
            )
        else:
            return True
    return False

Was looking into making the fix and realized there would be some repetition so thought I'd ask

gante commented 1 year ago

Hey @connor-henderson 👋 Thank you for the suggestion! Usually, I'd give the green light to configuration-related DRY approaches such as the one you suggested. However, this one would sit right in forward(), and we prioritize clear code (= avoid abstractions) in the modeling code itself.

In case you're curious about this position, we have a blog post about why we do it here 🤗

gante commented 1 year ago

@mollerup23 the list and the instructions are updated, in case you're interested in contributing :D

yhl48 commented 1 year ago

Would like to take GPT-2!

krypticmouse commented 1 year ago

I want to work on GPT-J!

Batese2001 commented 1 year ago

I would like to work on Blenderbot

KMFODA commented 1 year ago

Happy to take on Git, GptNeoX, ImageGPT, LED, LongT5, M2M100, Marian, MBart, MegratronBert, MVP, OPT, Pegasus, PegasusX, RemBert, RoFormer

younesbelkada commented 1 year ago

Thanks a mile @KMFODA ! 💯 Feel free to take those, and tag me or @gante whenever you feel ready!

saswatmeher commented 1 year ago

Hi, I am a newbie to open source and would like to contribute. @younesbelkada can I contribute to this issue?

younesbelkada commented 1 year ago

Hey @saswatmeher Of course yes!! You can pick up a model that has not been taken yet, for example BioGpt and do the following:

1- Fork this repository 2- Clone your fork locally and create a new branch git checkout -b fix-bio-gpt-issue 3- Modify the file src/transformers/models/biogpt/modeling_biogpt.py the same way as all the contributors have modified their files in #21818 #21833 #21815 etc. (You can check Files Changed on the PR, on the right top of the Pull Request page) 4- Apply these changes and push the changes on your branch 5- Finally open a Pull Request between fix-bio-gpt-issue and main branch of transformers (+ tag us, myself + @gante ) and we should be good to go!

Let us know if you have more questions!

saswatmeher commented 1 year ago

I am happy to pick up other models too. Can I work on Bart, Bert, BigBird.

asrimanth commented 1 year ago

Hello, can I work on Bloom?

younesbelkada commented 1 year ago

Hi @asrimanth , yes sure you can!

pmollerus23 commented 1 year ago

@mollerup23 the list and the instructions are updated, in case you're interested in contributing :D

Great! I'd like to work on OPT

soma2000-lang commented 1 year ago

HI @gante working on Whisper XGLM XLMRobertaXL

gante commented 1 year ago

@mollerup23 hey! OPT was claimed by @KMFODA a few comments above :) Still plenty of models up for grabs, though!

nipunjindal commented 1 year ago

Hello 👋, I would like to contribute and work on T5. Let me know, Thanks! PR for the suggested changes.

pmollerus23 commented 1 year ago

@younesbelkada Can I claim TimeSeriesTransformer?

younesbelkada commented 1 year ago

hi @mollerup23 Of course yes! Please feel free to take it!

younesbelkada commented 1 year ago

Hey @krypticmouse! Do you need any help for making the fix on GPT-j?

krypticmouse commented 1 year ago

Hi @younesbelkada, Thanks for asking. My PR got merged long ago.

younesbelkada commented 1 year ago

Thanks for the heads up, just updated the table, the only model left seems to be TimeSeries Transformer then, thank you all for the great contribution!

annahung31 commented 1 year ago

Hey @younesbelkada, may I work on the TimeSeries Transformer?

gante commented 1 year ago

@annahung31 I believe @mollerup23 is working on it :) @mollerup23, can you confirm?

younesbelkada commented 1 year ago

yes @gante @annahung31 , the PR is here: https://github.com/huggingface/transformers/pull/22272

huggingface / transformers