[Upgrade] lm-eval to 0.4.1

Upgrading lm-eval to 0.4.1 works as expected
deepsparse.evaluate ~/TinyStories-1M-ds  -i lm-eval-harness -d hellaswag --limit 10                                          (main|…4⚑2)
2024-02-23 09:38:21 deepsparse.evaluation.cli INFO     Creating deepsparse pipeline to evaluate from model path: /home/rahul/TinyStories-1M-ds
2024-02-23 09:38:21 deepsparse.evaluation.cli INFO     Datasets to evaluate on: ['hellaswag']
Batch size: 1
Splits to evaluate on: None
Metrics to evaluate on: None
Additional integration arguments supplied: {'limit': 10}
/home/rahul/projects/deepsparse/.venv/lib/python3.11/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.7.0.20240104 COMMUNITY | (86c38139) (release) (optimized) (system=avx512, binary=avx512)
/home/rahul/projects/deepsparse/.venv/lib/python3.11/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
2024-02-23:09:38:27,356 WARNING  [__init__.py:194] Some tasks could not be loaded due to missing dependencies. Run with `--verbosity DEBUG` for full details.
2024-02-23:09:38:29,194 WARNING  [__init__.py:194] Some tasks could not be loaded due to missing dependencies. Run with `--verbosity DEBUG` for full details.
2024-02-23 09:38:29 deepsparse.evaluation.integrations.lm_evaluation_harness INFO     Selected Tasks: ['hellaswag']
2024-02-23:09:38:29,195 INFO     [lm_evaluation_harness.py:73] Selected Tasks: ['hellaswag']
2024-02-23:09:38:33,291 INFO     [task.py:363] Building contexts for task on rank 0...
2024-02-23:09:38:33,295 INFO     [evaluator.py:324] Running loglikelihood requests
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:01<00:00, 26.56it/s]
2024-02-23 09:38:34 deepsparse.evaluation.cli INFO     Evaluation done. Results:
[Evaluation(task='lm-evaluation-harness', dataset=Dataset(type=None, name='hellaswag', config={'model': 'DeepSparseLM', 'model_args': None, 'batch_size': 1, 'batch_sizes': [], 'device': None, 'use_cache': None, 'limit': 10, 'bootstrap_iters': 100000, 'gen_kwargs': None}, split=None), metrics=[Metric(name='acc,none', value=0.3), Metric(name='acc_stderr,none', value=0.15275252316519466), Metric(name='acc_norm,none', value=0.2), Metric(name='acc_norm_stderr,none', value=0.13333333333333333)], samples=None)]
2024-02-23:09:38:34,848 INFO     [cli.py:212] Evaluation done. Results:
[Evaluation(task='lm-evaluation-harness', dataset=Dataset(type=None, name='hellaswag', config={'model': 'DeepSparseLM', 'model_args': None, 'batch_size': 1, 'batch_sizes': [], 'device': None, 'use_cache': None, 'limit': 10, 'bootstrap_iters': 100000, 'gen_kwargs': None}, split=None), metrics=[Metric(name='acc,none', value=0.3), Metric(name='acc_stderr,none', value=0.15275252316519466), Metric(name='acc_norm,none', value=0.2), Metric(name='acc_norm_stderr,none', value=0.13333333333333333)], samples=None)]
2024-02-23 09:38:34 deepsparse.evaluation.cli INFO     Saving the evaluation results to /home/rahul/projects/deepsparse/result.json
2024-02-23:09:38:34,848 INFO     [cli.py:220] Saving the evaluation results to /home/rahul/projects/deepsparse/result.json
To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1206662207418672
neuralmagic / deepsparse

[Upgrade] lm-eval to 0.4.1 #1618