Open zhangliang-04 opened 11 months ago
Is your problem solved ? please let me know since I am dealing with same issue
You can limit the #token feed to the model to match the max token length. This is rather a problem with lm_eval_harness than additional tasks.
You can limit the #token feed to the model to match the max token length. This is rather a problem with lm_eval_harness than additional tasks.
How do we do this exactly?
Thanks for your open sourcing! i'm trying to evaluate
Llama-7b-hf
onmmlu-fr
, a warning ofToken indices sequence length is longer than the specified maximum sequence length for this model (5023 > 4096). Running this sequence through the model will result in indexing errors
occurs and it seems the process is stuck. Here is the callstack after keyboard interrupt:Token indices sequence length is longer than the specified maximum sequence length for this model (5023 > 4096). Running this sequence through the model will result in indexing errors ^CTraceback (most recent call last): File "/data2/zl/code/mlmm-evaluation/main.py", line 135, in <module> main() File "/data2/zl/code/mlmm-evaluation/main.py", line 108, in main results = evaluator.open_llm_evaluate( File "/data2/zl/code/mlmm-evaluation/lm_eval/utils.py", line 205, in _wrapper return fn(*args, **kwargs) File "/data2/zl/code/mlmm-evaluation/lm_eval/evaluator.py", line 79, in open_llm_evaluate results = evaluate( File "/data2/zl/code/mlmm-evaluation/lm_eval/utils.py", line 205, in _wrapper return fn(*args, **kwargs) File "/data2/zl/code/mlmm-evaluation/lm_eval/evaluator.py", line 262, in evaluate resps = getattr(lm, reqtype)([req.args for req in reqs]) File "/data2/zl/code/mlmm-evaluation/lm_eval/base.py", line 181, in loglikelihood context_enc = self.tok_encode(context) File "/data2/zl/code/mlmm-evaluation/lm_eval/models/huggingface.py", line 361, in tok_encode return self.tokenizer.encode(string, add_special_tokens=self.add_special_tokens) File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2569, in encode encoded_inputs = self.encode_plus( File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2977, in encode_plus return self._encode_plus( File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 576, in _encode_plus batched_output = self._batch_encode_plus( File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 504, in _batch_encode_plus encodings = self._tokenizer.encode_batch( KeyboardInterrupt
it seems the process is stuck in the batched tokenizing, how to deal with this?
Did you fix this ??
Thanks for your open sourcing! i'm trying to evaluate
Llama-7b-hf
onmmlu-fr
, a warning ofToken indices sequence length is longer than the specified maximum sequence length for this model (5023 > 4096). Running this sequence through the model will result in indexing errors
occurs and it seems the process is stuck. Here is the callstack after keyboard interrupt:it seems the process is stuck in the batched tokenizing, how to deal with this?