neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.94k stars 169 forks source link

[Update] `deepsparse.evaluate` docstring #1615

Closed rahul-tuli closed 4 months ago

rahul-tuli commented 4 months ago

Originally this PR was meant to fix evaluation results not being displayed/saved anywhere, but that is working as intended, took this opportunity to update misleading command in cli docstring, + added MODEL_PATH descripton for the user (now this gets displayed when --help is invoked)

deepsparse.evaluate /home/rahul/TinyStories-1M-ds -i lm-eval-harness --dataset hellaswag --limit 10 
2024-02-20 16:49:55 __main__     INFO     Creating deepsparse pipeline to evaluate from model path: /home/rahul/TinyStories-1M-ds
2024-02-20 16:49:55 __main__     INFO     Datasets to evaluate on: ['hellaswag']
Batch size: 1
Splits to evaluate on: None
Metrics to evaluate on: None
Additional integration arguments supplied: {'limit': 10}
/home/rahul/projects/deepsparse/.venv/lib/python3.11/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.7.0.20240104 COMMUNITY | (86c38139) (release) (optimized) (system=avx512, binary=avx512)
/home/rahul/projects/deepsparse/.venv/lib/python3.11/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
2024-02-20 16:50:35 deepsparse.evaluation.integrations.lm_evaluation_harness INFO     Selected Tasks: ['hellaswag']
2024-02-20:16:50:35,047 INFO     [lm_evaluation_harness.py:73] Selected Tasks: ['hellaswag']
2024-02-20:16:50:40,698 INFO     [task.py:355] Building contexts for task on rank 0...
2024-02-20:16:50:40,710 INFO     [evaluator.py:319] Running loglikelihood requests
100%|█████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:01<00:00, 23.83it/s]
2024-02-20 16:50:42 __main__     INFO     Evaluation done. Results:
[Evaluation(task='lm-evaluation-harness', dataset=Dataset(type=None, name='hellaswag', config={'model': 'roneneldan/TinyStories-1M', 'model_args': None, 'batch_size': 1, 'batch_sizes': [], 'device': None, 'use_cache': None, 'limit': 10, 'bootstrap_iters': 100000, 'gen_kwargs': None}, split=None), metrics=[Metric(name='acc,none', value=0.3), Metric(name='acc_stderr,none', value=0.15275252316519466), Metric(name='acc_norm,none', value=0.2), Metric(name='acc_norm_stderr,none', value=0.13333333333333333)], samples=None)]
2024-02-20:16:50:42,445 INFO     [cli.py:212] Evaluation done. Results:
[Evaluation(task='lm-evaluation-harness', dataset=Dataset(type=None, name='hellaswag', config={'model': 'roneneldan/TinyStories-1M', 'model_args': None, 'batch_size': 1, 'batch_sizes': [], 'device': None, 'use_cache': None, 'limit': 10, 'bootstrap_iters': 100000, 'gen_kwargs': None}, split=None), metrics=[Metric(name='acc,none', value=0.3), Metric(name='acc_stderr,none', value=0.15275252316519466), Metric(name='acc_norm,none', value=0.2), Metric(name='acc_norm_stderr,none', value=0.13333333333333333)], samples=None)]
2024-02-20 16:50:42 __main__     INFO     Saving the evaluation results to /home/rahul/projects/deepsparse/result.json
2024-02-20:16:50:42,445 INFO     [cli.py:220] Saving the evaluation results to /home/rahul/projects/deepsparse/result.json

The results are displayed as a info log + correctly saved to result.json result.json

Noting this PR also needed #1606