issues
search
huggingface
/
lighteval
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
MIT License
461
stars
53
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Updated tgi_model and added parameters for endpoint_model
#208
shaltielshmid
opened
1 day ago
0
Adds ability to use functions for prompt definition
#207
hynky1999
opened
1 week ago
0
ADD GPT-4 as Judge
#206
philschmid
opened
1 week ago
0
Is there any available example based on lighteval/run_evals_nanotron.py ?
#205
JefferyChen453
closed
6 days ago
1
Shape mismatches in MMLU
#204
marcobellagente93
opened
1 week ago
1
Handling last token in tokenized_continuation
#203
simon900314
opened
1 week ago
0
Different results for gsm8k via lighteval compared to internal pipeline
#202
Mugariya
opened
2 weeks ago
0
[Bugfix] Avoid truncating the outputs based on string lengths
#201
anton-l
opened
2 weeks ago
0
quantized model not loading
#200
rankofootball
opened
3 weeks ago
6
ModuleNotFoundError: No module named 'lighteval'
#199
xinghuang2050
opened
3 weeks ago
2
The helm|piqa task is generative but has generation_size=-1.
#198
yonatano
opened
4 weeks ago
2
add openai dependency
#197
NouamaneTazi
closed
1 month ago
3
Expose `stop_sequence` at command line
#196
lewtun
opened
1 month ago
0
An apparent bug in drop's dealing with multi-span answer
#195
sadra-barikbin
opened
1 month ago
3
Mention HF_TOKEN in readme
#194
Wauplin
closed
1 month ago
0
`LightevalTask.process_results()` is not aligned with `LightevalTask.get_request_type()`
#193
sadra-barikbin
opened
1 month ago
3
`apply_target_perplexity_metric` pops only the first response
#192
sadra-barikbin
opened
1 month ago
0
Issue when saving results with fsspec==2023.12.1
#191
Hamza-Alobeidli
opened
1 month ago
0
Download BERT scorer lazily
#190
sadra-barikbin
opened
1 month ago
1
fix typos in readme
#189
kashif
closed
1 month ago
0
Zero scores on cnn-dm benchmark from HELM
#188
hicleo
closed
1 month ago
3
Fix a few typos and do a tiny refactor
#187
sadra-barikbin
opened
1 month ago
4
Fix AIMO
#186
lewtun
closed
1 month ago
0
Fix _init_max_length in base_model.py
#185
gucci-j
opened
1 month ago
1
Allow to launch TGI models with the Inference Endpoints code
#184
clefourrier
opened
1 month ago
1
Evaluate EncoderDecoderModels
#183
Bachstelze
opened
1 month ago
2
Fix broken link to extended tasks in README
#182
alexrs
closed
1 month ago
0
Add version config option.
#181
PhilipMay
closed
1 month ago
8
Fix citation section in readme
#180
NathanHB
closed
1 month ago
0
Performance compared to lm-evaluation-harness
#179
geoalgo
opened
2 months ago
6
add 'cite as' section in readme
#178
NathanHB
closed
2 months ago
2
Fix a few comment and docstring typos and a typehint
#177
sadra-barikbin
closed
1 month ago
6
Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!
#176
gody7334
opened
2 months ago
3
Error: `ModuleNotFoundError: No module named 'openai'`.
#175
PhilipMay
opened
2 months ago
5
Transformers model as Judge
#174
anilaltuner
opened
2 months ago
6
fix llm as judge warnings
#173
NathanHB
opened
2 months ago
4
Version of a task should be configurable.
#172
PhilipMay
closed
1 month ago
6
Fix prompt format german rag community task
#171
jphme
closed
2 months ago
4
Add Sympy equivalence for MATH / GSM8K?
#170
lewtun
opened
2 months ago
1
Data split depending on eval params
#169
clefourrier
opened
2 months ago
0
Fix: MTBench prompt function
#168
clefourrier
closed
2 months ago
0
Remove TGI models
#167
clefourrier
closed
1 month ago
3
`Could not initialize the JudgeOpenAI model` and `openi` import error
#166
lewtun
opened
2 months ago
1
DROP Evaluation with Llama3 (vs. lm-evaluation-harness)
#165
vipulraheja
opened
2 months ago
1
Expose a few model predictions / gold answers in the logs
#164
lewtun
opened
2 months ago
1
[MATH] Fix generation for chat models & fix normalization for predictions
#163
lewtun
closed
1 month ago
7
[MATH] Fix MATH normalization
#162
lewtun
closed
2 months ago
0
Feature: Checkpointing on task level.
#161
PhilipMay
closed
2 months ago
2
Nanotron tests
#160
zzhhjjj
opened
2 months ago
5
MMLU evaluation fails with Mistral
#159
sanchit-gandhi
opened
2 months ago
6
Next