huggingface lighteval issues

huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

MIT License

831 stars 99 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

IrokoBench (Afric tasks)

#357 hynky1999 closed 1 month ago
0
Translation literals

#356 hynky1999 closed 1 month ago
1
[FT] Single token completion loglikelihood auto-detection

#355 hynky1999 opened 1 month ago
0
Fix Tokenization + misc fixes

#354 hynky1999 closed 1 month ago
0
[BUG] batch_size = auto:1 issue

#353 alozowski opened 1 month ago
0
Custom task execution

#352 TomasJavurek closed 1 month ago
1
[BUG] Incorrect link in the docu

#351 TomasJavurek closed 1 month ago
1
Nathan llm judge quickfix

#350 NathanHB closed 1 month ago
0
change when the tests are triggered

#349 NathanHB closed 1 month ago
5
Nathan llm judge quickfix

#348 NathanHB closed 1 month ago
0
Fix llm as judge dtype

#347 NathanHB closed 1 month ago
0
Fix 341

#346 clefourrier closed 1 month ago
3
[BUG] assertion error `assert text[: len(left)] == left` on MATH wen Qwen-Math-2.5

#345 d1shs0ap opened 1 month ago
0
Nanotron greedy_until() fix

#344 vsabolcec closed 2 weeks ago
0
[BUGFIX] Fixes sampling when num_samples==1

#343 edbeeching closed 1 month ago
0
[BUG] VLLM backend uses sampling with t=1.0 even when num_samples==1

#342 edbeeching closed 1 month ago
0
[BUG] MATH eval is calculated with non-greedy sampling

#341 edbeeching closed 1 month ago
5
Serbian LLM Benchmark Task

#340 DeanChugall closed 1 month ago
5
Misc-multilingual tasks

#339 hynky1999 closed 1 month ago
0
Multilingual General Knowledge tasks

#338 hynky1999 closed 1 month ago
1
refacto judge and add mixeval

#337 NathanHB closed 1 month ago
0
Instructions for precommit hook for contributors.

#336 chuandudx closed 1 month ago
2
Adds config tempaltes

#335 hynky1999 closed 1 month ago
0
Skipping push to hub test

#334 clefourrier closed 1 month ago
0
Multilingual Reading Comprehension tasks

#333 hynky1999 closed 1 month ago
0
Multilingual Hellaswag tasks

#332 hynky1999 closed 1 month ago
0
Math normalization: do not crash on invalid format

#331 guipenedo closed 2 months ago
0
Multilingual COPA tasks

#330 hynky1999 closed 1 month ago
0
Multilingual NLI Tasks

#329 hynky1999 closed 1 month ago
0
bump lighteval version

#328 NathanHB closed 1 month ago
0
readme rewrite

#327 NathanHB closed 2 months ago
0
[EVAL] Add MixEval

#326 lewtun closed 1 month ago
1
[EVAL] Add ArenaHardAuto

#325 lewtun opened 2 months ago
0
[EVAL] Add RewardBench

#324 lewtun opened 2 months ago
2
Class implementations of faithfulness and extractiveness metrics

#323 chuandudx closed 1 month ago
2
[FT] Enable batched dataset_filter

#322 chuandudx opened 2 months ago
0
Add tools to doc and to prompt

#321 srossi93 closed 1 month ago
1
[BUG] AttributeError: 'str' object has no attribute 'category'

#320 Vanessa-Taing opened 2 months ago
1
Clarify API required for running llm-as-judge models.

#319 chuandudx closed 1 month ago
3
[FT] LLM-as-judge example that doesn't require OPENAI_KEY or pro subscription of HF

#318 chuandudx closed 1 month ago
3
[FIX] Fixes vllm backend

#317 NathanHB closed 2 months ago
0
Fix BLEURT evaluation errors

#316 chuandudx closed 1 month ago
3
[BUG] Errors when using BLEURT metric

#315 chuandudx closed 1 month ago
1
[FT] pass trust_remote_code as flag for loading datasets with custom code

#314 chuandudx opened 2 months ago
1
OALL v2

#313 alielfilali01 closed 2 months ago
0
[FT] Provide an interface for easier edit of parametrizable metrics

#312 clefourrier opened 2 months ago
1
Allow kwargs for BERTScore compute function and remove unused var

#311 chuandudx closed 1 month ago
2
[BUG] Errors when using BERTScore for evaluation

#310 chuandudx closed 1 month ago
4
Fix Metrics import path in community task template file.

#309 chuandudx closed 1 month ago
7
Selecting tasks using their superset

#308 hynky1999 closed 1 month ago
5

Previous Next