Originally this PR was meant to fix evaluation results not being displayed/saved anywhere, but that is working as intended, took this opportunity to update misleading command in cli docstring, + added MODEL_PATH descripton for the user (now this gets displayed when --help is invoked)
deepsparse.evaluate /home/rahul/TinyStories-1M-ds -i lm-eval-harness --dataset hellaswag --limit 10
2024-02-20 16:49:55 __main__ INFO Creating deepsparse pipeline to evaluate from model path: /home/rahul/TinyStories-1M-ds
2024-02-20 16:49:55 __main__ INFO Datasets to evaluate on: ['hellaswag']
Batch size: 1
Splits to evaluate on: None
Metrics to evaluate on: None
Additional integration arguments supplied: {'limit': 10}
/home/rahul/projects/deepsparse/.venv/lib/python3.11/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.7.0.20240104 COMMUNITY | (86c38139) (release) (optimized) (system=avx512, binary=avx512)
/home/rahul/projects/deepsparse/.venv/lib/python3.11/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
2024-02-20 16:50:35 deepsparse.evaluation.integrations.lm_evaluation_harness INFO Selected Tasks: ['hellaswag']
2024-02-20:16:50:35,047 INFO [lm_evaluation_harness.py:73] Selected Tasks: ['hellaswag']
2024-02-20:16:50:40,698 INFO [task.py:355] Building contexts for task on rank 0...
2024-02-20:16:50:40,710 INFO [evaluator.py:319] Running loglikelihood requests
100%|█████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:01<00:00, 23.83it/s]
2024-02-20 16:50:42 __main__ INFO Evaluation done. Results:
[Evaluation(task='lm-evaluation-harness', dataset=Dataset(type=None, name='hellaswag', config={'model': 'roneneldan/TinyStories-1M', 'model_args': None, 'batch_size': 1, 'batch_sizes': [], 'device': None, 'use_cache': None, 'limit': 10, 'bootstrap_iters': 100000, 'gen_kwargs': None}, split=None), metrics=[Metric(name='acc,none', value=0.3), Metric(name='acc_stderr,none', value=0.15275252316519466), Metric(name='acc_norm,none', value=0.2), Metric(name='acc_norm_stderr,none', value=0.13333333333333333)], samples=None)]
2024-02-20:16:50:42,445 INFO [cli.py:212] Evaluation done. Results:
[Evaluation(task='lm-evaluation-harness', dataset=Dataset(type=None, name='hellaswag', config={'model': 'roneneldan/TinyStories-1M', 'model_args': None, 'batch_size': 1, 'batch_sizes': [], 'device': None, 'use_cache': None, 'limit': 10, 'bootstrap_iters': 100000, 'gen_kwargs': None}, split=None), metrics=[Metric(name='acc,none', value=0.3), Metric(name='acc_stderr,none', value=0.15275252316519466), Metric(name='acc_norm,none', value=0.2), Metric(name='acc_norm_stderr,none', value=0.13333333333333333)], samples=None)]
2024-02-20 16:50:42 __main__ INFO Saving the evaluation results to /home/rahul/projects/deepsparse/result.json
2024-02-20:16:50:42,445 INFO [cli.py:220] Saving the evaluation results to /home/rahul/projects/deepsparse/result.json
The results are displayed as a info log + correctly saved to result.json
result.json
Noting this PR also needed #1606
To see the specific tasks where the Asana app for GitHub is being used, see below:
Originally this PR was meant to fix evaluation results not being displayed/saved anywhere, but that is working as intended, took this opportunity to update misleading command in cli docstring, + added MODEL_PATH descripton for the user (now this gets displayed when
--help
is invoked)The results are displayed as a info log + correctly saved to result.json result.json
Noting this PR also needed #1606