Open ghost opened 3 months ago
I believe this is because you are attempting to load a huggingface/transformers
model as a KerasNLP model. The config and weights are in a fundamentally different format. See https://github.com/keras-team/keras-nlp/issues/1294 for a bit more of a deep dive here.
I think there's two actionable things here:
transformers
models to keras_nlp
models, and vice versa. TBD on how exactly that should work. A set of tools shipped with keras-nlp? Suggestion welcome.config.json
. It's confusing!@Wauplin @SamanehSaadat cc'ing for thoughts.
More immediately, we might want to make sure we can detect this case and have good error message. Especially because both Transformers and KerasNLP use a config.json. It's confusing!
I see two places in the code where we could add a check. Either when we download from the Hub (check the metadata.json
file for instance?) or when we load the model from a local preset. Both are valid IMO. If we add the check when downloading the files from HF, that would be HF-specific and therefore not applied if a user manually download files and try to load them (you'll have such usage for sure!). If we add the check when loading the files, that would not be specific to HF which is somehow better IMO. It would mean checking the consistency of the config.json
compared to what's expected. WDYT?
Agree on the longer term goal as well but I'll defer the topic to @Rocketknight1 who's more knowledgeable on the transformers
side (and most likely to be discussed in a separate issue?).
@mattdangerw I think a short-term check that raises a sensible error makes sense as the first step. Longer-term, conversion should be possible - once we detect a transformers checkpoint in KerasNLP, as long as KerasNLP already has support for that architecture, we could just have a mapping for config attributes, tokenizer vocab and layer names to convert the checkpoint.
In theory, simple. In practice, I'm sure there are lots of painful edge cases to worry about, but I suspect we'd get most of the benefit just from supporting a few of the most popular architectures (e.g. Gemma/Llama/Mistral/Mixtral), and maybe that wouldn't be so bad!
@Wauplin I agree that we should add a check when loading the model to make sure both local and remote presets are covered. I believe, the check should be done at the beginning of our from_preset()
. I'll add the check.
Describe the bug
I am unable to instantiate a keras_nlp.models.Backbone from a model preset of Hugging Face handles, and get the following error:
To Reproduce https://colab.research.google.com/drive/1sL3dEM8ZOLCbQ5RM1P6usrrzfqk5aTbQ?usp=sharing
Expected behavior This line should probably be changed? https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/utils/preset_utils.py#L411
E.g., config.json, from HF: https://huggingface.co/google-bert/bert-base-uncased/resolve/main/config.json
Would you like to help us fix it?