issues
search
huggingface
/
lighteval
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
MIT License
831
stars
99
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
IrokoBench (Afric tasks)
#357
hynky1999
closed
1 month ago
0
Translation literals
#356
hynky1999
closed
1 month ago
1
[FT] Single token completion loglikelihood auto-detection
#355
hynky1999
opened
1 month ago
0
Fix Tokenization + misc fixes
#354
hynky1999
closed
1 month ago
0
[BUG] batch_size = auto:1 issue
#353
alozowski
opened
1 month ago
0
Custom task execution
#352
TomasJavurek
closed
1 month ago
1
[BUG] Incorrect link in the docu
#351
TomasJavurek
closed
1 month ago
1
Nathan llm judge quickfix
#350
NathanHB
closed
1 month ago
0
change when the tests are triggered
#349
NathanHB
closed
1 month ago
5
Nathan llm judge quickfix
#348
NathanHB
closed
1 month ago
0
Fix llm as judge dtype
#347
NathanHB
closed
1 month ago
0
Fix 341
#346
clefourrier
closed
1 month ago
3
[BUG] assertion error `assert text[: len(left)] == left` on MATH wen Qwen-Math-2.5
#345
d1shs0ap
opened
1 month ago
0
Nanotron greedy_until() fix
#344
vsabolcec
closed
2 weeks ago
0
[BUGFIX] Fixes sampling when num_samples==1
#343
edbeeching
closed
1 month ago
0
[BUG] VLLM backend uses sampling with t=1.0 even when num_samples==1
#342
edbeeching
closed
1 month ago
0
[BUG] MATH eval is calculated with non-greedy sampling
#341
edbeeching
closed
1 month ago
5
Serbian LLM Benchmark Task
#340
DeanChugall
closed
1 month ago
5
Misc-multilingual tasks
#339
hynky1999
closed
1 month ago
0
Multilingual General Knowledge tasks
#338
hynky1999
closed
1 month ago
1
refacto judge and add mixeval
#337
NathanHB
closed
1 month ago
0
Instructions for precommit hook for contributors.
#336
chuandudx
closed
1 month ago
2
Adds config tempaltes
#335
hynky1999
closed
1 month ago
0
Skipping push to hub test
#334
clefourrier
closed
1 month ago
0
Multilingual Reading Comprehension tasks
#333
hynky1999
closed
1 month ago
0
Multilingual Hellaswag tasks
#332
hynky1999
closed
1 month ago
0
Math normalization: do not crash on invalid format
#331
guipenedo
closed
2 months ago
0
Multilingual COPA tasks
#330
hynky1999
closed
1 month ago
0
Multilingual NLI Tasks
#329
hynky1999
closed
1 month ago
0
bump lighteval version
#328
NathanHB
closed
1 month ago
0
readme rewrite
#327
NathanHB
closed
2 months ago
0
[EVAL] Add MixEval
#326
lewtun
closed
1 month ago
1
[EVAL] Add ArenaHardAuto
#325
lewtun
opened
2 months ago
0
[EVAL] Add RewardBench
#324
lewtun
opened
2 months ago
2
Class implementations of faithfulness and extractiveness metrics
#323
chuandudx
closed
1 month ago
2
[FT] Enable batched dataset_filter
#322
chuandudx
opened
2 months ago
0
Add tools to doc and to prompt
#321
srossi93
closed
1 month ago
1
[BUG] AttributeError: 'str' object has no attribute 'category'
#320
Vanessa-Taing
opened
2 months ago
1
Clarify API required for running llm-as-judge models.
#319
chuandudx
closed
1 month ago
3
[FT] LLM-as-judge example that doesn't require OPENAI_KEY or pro subscription of HF
#318
chuandudx
closed
1 month ago
3
[FIX] Fixes vllm backend
#317
NathanHB
closed
2 months ago
0
Fix BLEURT evaluation errors
#316
chuandudx
closed
1 month ago
3
[BUG] Errors when using BLEURT metric
#315
chuandudx
closed
1 month ago
1
[FT] pass trust_remote_code as flag for loading datasets with custom code
#314
chuandudx
opened
2 months ago
1
OALL v2
#313
alielfilali01
closed
2 months ago
0
[FT] Provide an interface for easier edit of parametrizable metrics
#312
clefourrier
opened
2 months ago
1
Allow kwargs for BERTScore compute function and remove unused var
#311
chuandudx
closed
1 month ago
2
[BUG] Errors when using BERTScore for evaluation
#310
chuandudx
closed
1 month ago
4
Fix Metrics import path in community task template file.
#309
chuandudx
closed
1 month ago
7
Selecting tasks using their superset
#308
hynky1999
closed
1 month ago
5
Previous
Next