[DeepSparse Evaluation API] Perplexity eval support for `openai_humaneval`, `c4`, `wikitext2`

Example use

deepsparse.evaluate hf:mgoin/TinyStories-1M-deepsparse --integration perplexity --dataset wikitext2 --limit 2 --batch_size 2 --max_sequence_length 128

2024-02-06 18:14:27 deepsparse.evaluation.cli INFO     Creating deepsparse pipeline to evaluate from model path: hf:mgoin/TinyStories-1M-deepsparse
2024-02-06 18:14:27 deepsparse.evaluation.cli INFO     Datasets to evaluate on: ['wikitext2']
Batch size: 2
Splits to evaluate on: None
Metrics to evaluate on: None
Additional integration arguments supplied: {'limit': 2, 'max_sequence_length': 128}
Fetching 11 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 149796.57it/s]
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.7.0.20240104 COMMUNITY | (86c38139) (release) (optimized) (system=avx2, binary=avx2)
2024-02-06 18:14:30 deepsparse.evaluation.integrations.perplexity INFO     Argument `splits` is None. Defaulting to `test` split.
Token indices sequence length is longer than the specified maximum sequence length for this model (287645 > 2048). Running this sequence through the model will result in indexing errors
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.51it/s]
2024-02-06 18:14:38 deepsparse.evaluation.cli INFO     Evaluation done. Results:
[Evaluation(task='perplexity', dataset=Dataset(type=None, name='wikitext2', config=None, split='test'), metrics=[Metric(name='perplexity', value=24642.261152241255)], samples=None)]
2024-02-06 18:14:38 deepsparse.evaluation.cli INFO     Saving the evaluation results to /nm/drive0/damian/deepsparse/result.json
neuralmagic / deepsparse

[DeepSparse Evaluation API] Perplexity eval support for `openai_humaneval`, `c4`, `wikitext2` #1586

Example use