neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.94k stars 169 forks source link

[Fix] Raise ValueError when llm-eval gets an unknown dataset to evaluate on (instead of failing silently) #1625

Closed dbogunowicz closed 4 months ago

dbogunowicz commented 4 months ago

Feature Description

Fail more verbosely when passing datasets name that are not recognized by lm-evaluation-harness

from deepsparse import evaluate
stub = "zoo:opt-1.3b-opt_pretrain-quantW8A8"
datasets = ["openai_humaneval"]
res = evaluate(stub, datasets=datasets, integration="lm-eval-harness")
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.7.0.20240220 COMMUNITY | (8a843a2e) (release) (optimized) (system=avx2, binary=avx2)
[7f4e5f5ed640 >WARN<  operator() ./src/include/wand/utility/warnings.hpp:14] Generating emulated code for quantized (INT8) operations since no VNNI instructions were detected. Set NM_FAST_VNNI_EMULATION=1 to increase performance at the expense of accuracy.
Traceback (most recent call last):
  File "/nm/drive0/damian/deepsparse/hehe.py", line 4, in <module>
    res = evaluate(stub, datasets=datasets, integration="lm-eval-harness")
  File "/nm/drive0/damian/deepsparse/src/deepsparse/evaluation/evaluator.py", line 65, in evaluate
    return eval_integration(
  File "/nm/drive0/damian/deepsparse/src/deepsparse/evaluation/integrations/lm_evaluation_harness.py", line 75, in integration_eval
    raise ValueError(
ValueError: could recognize the dataset: openai_humaneval. Make sure that the requested dataset is compatible with the llm-evaluation-harness