Generate: deprecate the use of model `config` as a source of defaults

EDIT: Updated with the discussion up to 2022/08/20

Why?

A confusing part of generate is how the defaults are set. When a certain argument is not specified, we attempt to fetch it from the model config file. This makes generate unpredictable and hard to fully document (the default values change for each model), as well as a major source of issues :hocho:

How?

We have the following requirements: 1️⃣ The existing behavior can’t be removed, i.e., we must be able to use the model config.json as a source of generation parameters by default; 2️⃣ We do need per-model defaults -- some models are designed to do a certain thing (e.g. summarization), which requires a specific generation configuration. 3️⃣ Users must have full control over generate, with minimal hidden behavior.

Ideally, we also want to: 4️⃣ Have separation of concerns and use a new generate_config.json to parameterize generation;

A TL;DR of the plan consists in changing the paradigm from “non-specified generate arguments are overridden by the [model] configuration file” to “generate arguments will override the [generate] configuration file, which is always used”. With proper documentation changes and logging/warnings, the user will be aware of what's being set for generate.

Step 1: Define a new generate config file and class

Similar to the model config, we want a .json file to store the generation defaults. The class itself can be a very simplified version of PretrainedConfig, also with functionality to load/store from the hub.

Step 2: Integrate loading generate config file in `.from_pretrained()`

The generation configuration file should be loaded when initializing the model with a from_pretrained() method. A couple of things to keep in mind:

There will be a new kwarg in from_pretrained, generate_config (or generation_config? Leaning toward the former as it has the same name as the function);
It will default to generate_config.json (contrarily to the model config, which defaults to None). This will allow users to set this argument to None, to load a model with an empty generate config. Some users have requested a feature like this;
Because the argument can take a path, it means that users can store/load multiple generate configs if they wish to do so (e.g. to use the same model for summarization, creative generation, factual question-answering, etc) 🚀
Only models that can run generate will attempt to load it;
If there is no generate_config.json in the repo, it will attempt to initialize the generate configuration from the model config.json. This means that this solution will not change any generate behavior and will NOT need a major release 👼
To keep the user in the loop, log ALL parameters set when loading the generation config file. Something like the snippet below.
Because this happens at from_pretrained() time, logging will only happen at most once and will not be verbose.

`facebook/opt-1.3b` generate configuration loaded from `generate_config.json`. The following generation defaults were set:
- max_length: 20
- foo: bar
- baz: qux

Step 3: Generate uses the generate config class internally

Instead of using the configuration to override arguments when they are not set, overwrite a copy of the generation config at generate time. I.e. instead of:

arg = arg if arg is not None else self.config.arg
...

generate_config = self.generate_config.copy()
generate_config.arg = arg if arg is not None
...

This change has three main benefits:

We can improve the readability of the code, as we gain the ability to pass configs around. E.g. this function won't need to take a large list of arguments nor to bother with their initialization.
Building generate argument validation for each type of generation can be built in simple functions that don't need ~30 arguments as input 🙃
The three frameworks (PT/TF/FLAX) can share functionality like argument validation, decreasing maintenance burden.

Step 4: Document and open PRs with the generation config file

Rewrite part of the documentation to explain that a generation config is ALWAYS used (regardless of having defaults loaded from the hub or not). Open Hub PRs to pull generate-specific parameters from config.json to generate_config.json

Pros/Cons

Pros:

Better awareness -- any generate default will be logged to the screen when loading a generate-compatible model;
Full control -- the users can choose NOT to load generation parameters or easily load a set of options from an arbitrary file;
Enables more readable generate code;
Enables sharing generate-related code across frameworks;
Doesn't need a major release.

Cons:

Pulling the generate parameters into their own files won't happen everywhere, as merging the changes described in step 4 is not feasible for all models (e.g. due to unresponsive model owners);
Logging loaded defaults may not be enough to stop issues related to the default values, as the logs can be ignored;
Another config file (and related code) to maintain.

cc @patrickvonplaten

I like the idea of using a use_config_defaults a lot - think that's a great additional safety mechanism to ensure it's possible to keep backward compatibility.

Also we were thinking about the idea of having a generation_config.json file that can optionally be passed to generate by the user and that includes all the default values that are set in the config at the moment. This would also make it easier to possible have multiple different generation configs. Some models like bart-large: https://huggingface.co/facebook/bart-large/blob/main/config.json#L45 always have certain generation parameters enabled by default and IMO it would be a smoother transition to help the user extract a generation_config.json from config.json and then always pass this config if present in the repo to generate(...) instead of forcing the user to always pass all those arguments to generate.

With the config, we could do something like the following automatically:

User runs model repo with generate. We detect that no generation_config.json is present and that default generation params are used from config.json
We throw a warning that states "no generation config detected, we strongly advise you to run the following code snippet on your repo to create a generate_config.json file
We keep all the generation params in config.json though to keep backwards compatibility with use_config_defaults
However if a generation_config.py is present we always use this and do not look into the config
We have to make an exception with max_length=20 because it's always set and we don't want to create a generation_config.py for all models

Also happy to jump on a call to brainstorm about this a bit!

Fair point! 👍

From the comment above, let's consider the updated requirements:

Until v5, the default behavior can’t change, i.e., we will use the model config.json as a source of defaults;
From v5 onwards, the default behavior is to use generate_config.json as a source of defaults;
The transition should be as smooth as possible — the users should be able to anticipate this transition, so nothing changes when we release the new major version;
We want to use defaults (many models are designed to do a certain thing) while also enabling power users to have full control over generate.

A solution that fits all requirements is the ability to specify where the defaults should be loaded from, with default paths controlled by us. With the aid of a script to create the new generation config file from the existing model config file, the transition should be smooth and users can anticipate any change.

E.g. if we have a generation_config_file flag, defaulting to None and where a path in the model repo can be specified, then we could:

Set generation_config_file="config.json", which would mimic the existing default behavior (and would be the default behavior for now);
Set generation_config_file="generation_config.json", which would use the new config file for generation (which would be the default behavior in the future);
Set generation_config_file to ANY generation config path, so that power users can have multiple configurations for the same model;
Set generation_config_file=False (or other non-None falsy value) to not use any configuration at all.

We seem to need two warnings ⚠️ :

[Needed because in v5 we will be defaulting to a new config file, which may not exist in a user's model repo, and the model may have generation parameters in its config] If the configuration file does not exist, fall back to config.json and warn about it. We can quickly scan config.json to avoid raising a warning if it doesn't contain any generation argument;
[Needed because the default behavior will still be to use values from a config, and many users are not aware of it] If generation_config_file is not specifically set by the user, a warning should be raised if the config replaces any argument. Many configs don't replace any value.

Both warnings can be avoided by specifying the generation_config_file argument. They may be a bit verbose, but I think verbosity (which can be shut down easily) is preferable to magic confusing behavior.

The max_length=20 default (and other similar defaults) can be easily added -- max_length = max_length if max_length is not None else 20 after attempting to load the configs. We can add them to the argument's documentation (see below).

🤔 The only issue I have with this approach is that it is hell to document (similar to the current approach). Having "this argument defaults to X or to config.argument" for all arguments' documentation line is verbose and confusing, and users need to be aware that the configuration files play an important role.

My suggestion here would be to make generation_config_file the second argument of generate (after input_ids), so that it becomes immediately clear that generate argument defaults can be set through a file. Then, I would remove further references to the config in the docs, relying on the warnings to remind the user of what's going on. I think it is clear by now that long docs don't avoid simple issues :(

WDYT?

(P.S.: will edit the issue after we settle on an approach :) )

Cool, I think this is going into a very nice direction! A couple more questions to think about:

Do we really want a generate_config_file keyword argument for generate(...) ? For me it would be much more intuitive to just have config: Optional[Dict] as an argument. This would then mean it requires the user to do one more step for a specific config:

generate_config = # load generation config from path
model.generate(input_ids, config=generate_config)

We could add a config argument to the init of GenerationMixin which would make backwards compatibility then very easy:
- from_pretrained(...) would load either a generation_config.json or if not present a config.json and then set it as self.generation_config = config => then every generation model would have access to self.generation_config . In generate would could then add a self.generate_config = config if config is not None else self.generate_config (the default one) and then overwrite self.generate_config once more with if the user passes generate args into generate directly (e.g. model.generate(top_k=top_k)`
- Overall I think we cannot really come around the fact the we need to store a config inside the model somewhere because it'd be a bit to me to load a config upon calling generate. E.g. model.generate(..., generate_config="config.json") would have to load a config which opens too many problems with internet connection etc....
What type should generation_config be? Just a dict or let's maybe create a class for it (similar to BloomConfig). Creating its own class probably also helps with documentation

-> What do you think?

@patrickvonplaten Agreed, the argument name is a bit too long 😅 However, if we decide to go the GenerationMixin.__init__ route, we can't pick config -- PreTrainedModel, which inherits from GenerationMixin, uses a config argument for the model config. Perhaps generation_config? We could then do .from_pretrained(foo, generation_config=bar).

I love the ideas you gave around the config:

if it is part of the __init__ and if we always attempt to load the new file format before falling back to the original config, it actually means we don't need to do a major release to build the final version of this updated configuration handling! No need to change defaults with a new release at all ❤️ ;
The idea of "arguments write into a config that is always used" as opposed to "config is used when no arguments are passed" is much clearer to explain. We gain the ability to pass config files around (as opposed to tens of arguments), and it also opens the door to exporting generation configurations;
Despite the above, we need to be careful with the overwrites: if a user calls model.generate(top_k=top_k) and then model.generate(temperature=temperature), top_k should be the original config's top_k. Copies of objects are needed;
Agreed, having all downloads/file paths in the same place is helpful.

Regarding dict vs class -- I'd go with class (or perhaps a simpler dataclass). Much easier to document and enforce correctness, e.g. check if the right arguments are being used with a certain generation type.

It seems like we are in agreement. Are there more issues we can anticipate?

Very nice summary @gante thanks for writing this all down - I agree with all the above point!

@LysandreJik @sgugger and maybe @thomwolf could you take a quick look here? I think @gante and I have now an actionable plan for generate() and would be ready to open a PR.

Before starting the PR, it would be nice if you could check if you generally agree with our comments here so that we're not totally on a different page before opening such a big PR. The PR will then take some time and require discussion, but I think we have a clear vision of what we want now

@patrickvonplaten @LysandreJik @sgugger @thomwolf -- I took the liberty of updating the issue at the top with the plan that originated from the discussion here (and also to structure the whole thing in my head better) :)

Thanks for the write-up! I think this is a much welcome change that will tremendously improve the way we use generate.

Writing down some thoughts below.

Very personal, but I think generation_config sounds more explicit. generate is very understandable by us because we know what is the "generate method", but "generation config" sounds so much clearer to me than "generate config".
Would the generate config class be able to load from other models? i.e., could we load a generation config specific to bart-large-cnn in t5-large? Would we enforce model-specificity, or would we allow this to work? How would we handle model-specific attributes (maybe there aren't any, there seems to be only RAG that has its own generate method)?
Could we store multiple generation configs in the same repo? How would you handle a model that can have several generation configuration, for example a model such as a prefix-LM that could do both translation and summarization with the same checkpoint?

The biggest work here will likely be education & documentation. I think this will already make things much clearer, but I suppose the much awaited generate method doc rework will be an absolute requirement after this refactor!

Agreed, the biggest issue is and will be education and documentation. Hopefully, this will make the process easier 🙏

Regarding one or multiple generation config classes: there are two arguments in generate that are used with a limited number of (encoder-decoder) models, forced_bos_token_id and forced_eos_token_id. Additionally, there is one argument, encoder_no_repeat_ngram_size, that is only used in encoder-decoder models (and have no effect on decoder-only). The remaining 36 arguments are usable by all models. IMO, having a single class would make documentation (the key issue) much simpler, and model<>arguments verification can be done in the function (as it is done in the present).
Regarding multiple configs in the same repo: Yes that would be doable. According to the plan above, through the specification of a different generation_config files. But @LysandreJik raised a good point, as the name of the files containing the defaults for different tasks may not be immediately obvious to the users, which implies more documentation pain. Perhaps we can take the chance here to approximate generate to pipeline, which we know is user-friendly -- in the pipeline, specific config parameters are loaded for each task (here's an example of a config with task-specific parameters). We could use the exact same structure with the new generation_config files, where all task-specific arguments can be set this way, and generate could gain a new task argument. That way, there would be a higher incentive to set task-specific parameters that would work across the HF ecosystem (generate and pipeline for now, perhaps more in the future).

# with the `task` parameter, it is trivial to share the parameters for some desired behavior

# When loading the model, the existence of task-specific options would be logged to the user.
model = AutoModelForSeq2SeqLM.from_pretrained("...")
input_prompt = ...
task_tokens = model.generate(**input_prompt, task="my_task")
# There would be an exception if `my_task` is not specified in the generation config file.

The plan looks good to me, but the devil will be in the details ;-) Looking forward to the PRs actioning this!

Closing -- generation_config is now the source of defaults for .generate()

huggingface / transformers