Optimum Intel OpenVino fails with segmentation fault

Hi @NoushNabi,

Recently, the Optimum Intel OpenVino tests have been failing intermittently because of what appears to be a race condition due to multiple concurrent calls to inference. This causes the run to exit with a segmentation fault. Could you take a look?

Example logs from this run:

Executor.execute {
      Parallelizing computation on 10 items over 4 threads {
        Created cache with config: SqliteCacheConfig(path='prod_env/cache/hf-internal-testing.sqlite')
        Created cache with config: SqliteCacheConfig(path='prod_env/cache/hf-internal-testing.sqlite')
        Loading hf-internal-testing/tiny-random-MistralForCausalLM (kwargs={'openvino': True}) for HELM model hf-internal-testing/tiny-random-MistralForCausalLM with Hugging Face Transformers {
          Hugging Face device set to "cpu" because CUDA is unavailable.
          Loading Hugging Face model hf-internal-testing/tiny-random-MistralForCausalLM {
            Created cache with config: SqliteCacheConfig(path='prod_env/cache/hf-internal-testing.sqlite')
            Created cache with config: SqliteCacheConfig(path='prod_env/cache/hf-internal-testing.sqlite')
            Created cache with config: SqliteCacheConfig(path='prod_env/cache/hf-internal-testing.sqlite')
We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class (https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/transformers/cache_utils.py:447: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  or len(self.key_cache[layer_idx]) == 0  # the layer has no cache
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py:281: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  elif sliding_window is None or key_value_length < sliding_window:
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/transformers/cache_utils.py:432: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  elif len(self.key_cache[layer_idx]) == 0:  # fills previously skipped layers; checking for tensor causes errors
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
          } [7.965s]
        } [7.966s]
        HuggingFace error: Infer Request is busy
        Request failed. Retrying (attempt #2) in 10 seconds... (See above for error details)
        HuggingFace error: Infer Request is busy
        Request failed. Retrying (attempt #2) in 10 seconds... (See above for error details)
        HuggingFace error: Infer Request is busy
        Request failed. Retrying (attempt #2) in 10 seconds... (See above for error details)
/home/runner/work/_temp/3b3f1c68-38a5-4e0d-ba66-80ecc08f0[297](https://github.com/stanford-crfm/helm/actions/runs/11369018353/job/31625461750#step:7:298).sh: line 1:  2069 Segmentation fault      (core dumped) helm-run --run-entries boolq:model=hf-internal-testing/tiny-random-MistralForCausalLM --enable-huggingface-models hf-internal-testing/tiny-random-MistralForCausalLM --suite v1 --max-eval-instances 10 --openvino

stanford-crfm / helm

Optimum Intel OpenVino fails with segmentation fault #3066