Closed andstor closed 2 years ago
Most likely this PR https://github.com/huggingface/transformers/pull/15473 introduced the breakage, since this code didn't exist before:
Most likely the new code has been merged w/o a test exercising this particular code path and your use case triggered the issue.
pinging the author: @davidleonfdez and the reviewer: @sgugger
to unblock you, @andstor, until this is sorted out please switch to a commit before that PR, that is:
git clone https://github.com/huggingface/transformers
cd transformers
git checkout 4f5faaf04407d4
Indeed, the problem comes from the model returning more than one logit (it has use_cache
set to True
in its config) which we didn't anticipate in that PR. I will send a fix when I have time.
Thanks ❤️ Switch to the commit before that PR did do the trick 👌
Indeed, the problem comes from the model returning more than one logit (it has
use_cache
set toTrue
in its config) which we didn't anticipate in that PR. I will send a fix when I have time.
Sorry, maybe I wasn't as careful with the examples as I should have been 😞. I've just learned about past_key_values
. I had tested the example with GPT2, whose config has keys_to_ignore_at_inference = ["past_key_values"]
, so it doesn't return a tuple. I can try to fix it.
@davidleonfdez If you want to work on a fix, look at how the compute_metrics
in the run_glue
script is defined. I believe you just need to add a similar test as this one at the beginning of the preprocess_logits_for_metrics
function for the case where the model returns more than one logit.
@davidleonfdez If you want to work on a fix, look at how the
compute_metrics
in therun_glue
script is defined. I believe you just need to add a similar test as this one at the beginning of thepreprocess_logits_for_metrics
function for the case where the model returns more than one logit.
Thanks!
Environment info
transformers
version: 4.17.0.dev0Who can help
@stas00, @patil-suraj
Information
Model I am using: GPT-J-6B Running on GPU cluster with 10 x NVIDIA A100 40G
The problem arises when using:
The tasks I am working on is:
Purpose is to fine-tune GPT-J for generating smart contract code.
To reproduce
Steps to reproduce the behavior:
Run example training script
transformers/examples/pytorch/language-modeling/run_clm.py
.HF launch script:
DeepSpeed config:
Fails with this error:
The model runs fine without evaluation turned on.
Expected behavior
The example script should run without producing an error.