Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).
Recently, the Optimum Intel OpenVino tests have been failing intermittently because of what appears to be a race condition due to multiple concurrent calls to inference. This causes the run to exit with a segmentation fault. Could you take a look?
Executor.execute {
Parallelizing computation on 10 items over 4 threads {
Created cache with config: SqliteCacheConfig(path='prod_env/cache/hf-internal-testing.sqlite')
Created cache with config: SqliteCacheConfig(path='prod_env/cache/hf-internal-testing.sqlite')
Loading hf-internal-testing/tiny-random-MistralForCausalLM (kwargs={'openvino': True}) for HELM model hf-internal-testing/tiny-random-MistralForCausalLM with Hugging Face Transformers {
Hugging Face device set to "cpu" because CUDA is unavailable.
Loading Hugging Face model hf-internal-testing/tiny-random-MistralForCausalLM {
Created cache with config: SqliteCacheConfig(path='prod_env/cache/hf-internal-testing.sqlite')
Created cache with config: SqliteCacheConfig(path='prod_env/cache/hf-internal-testing.sqlite')
Created cache with config: SqliteCacheConfig(path='prod_env/cache/hf-internal-testing.sqlite')
We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class (https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/transformers/cache_utils.py:447: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
or len(self.key_cache[layer_idx]) == 0 # the layer has no cache
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py:281: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
elif sliding_window is None or key_value_length < sliding_window:
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/transformers/cache_utils.py:432: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
elif len(self.key_cache[layer_idx]) == 0: # fills previously skipped layers; checking for tensor causes errors
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
} [7.965s]
} [7.966s]
HuggingFace error: Infer Request is busy
Request failed. Retrying (attempt #2) in 10 seconds... (See above for error details)
HuggingFace error: Infer Request is busy
Request failed. Retrying (attempt #2) in 10 seconds... (See above for error details)
HuggingFace error: Infer Request is busy
Request failed. Retrying (attempt #2) in 10 seconds... (See above for error details)
/home/runner/work/_temp/3b3f1c68-38a5-4e0d-ba66-80ecc08f0[297](https://github.com/stanford-crfm/helm/actions/runs/11369018353/job/31625461750#step:7:298).sh: line 1: 2069 Segmentation fault (core dumped) helm-run --run-entries boolq:model=hf-internal-testing/tiny-random-MistralForCausalLM --enable-huggingface-models hf-internal-testing/tiny-random-MistralForCausalLM --suite v1 --max-eval-instances 10 --openvino
Hi @NoushNabi,
Recently, the Optimum Intel OpenVino tests have been failing intermittently because of what appears to be a race condition due to multiple concurrent calls to inference. This causes the run to exit with a segmentation fault. Could you take a look?
Example logs from this run: