huggingface / lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
MIT License
646 stars 72 forks source link

`--no_multichoice_continuations_start_space` should also cover startof word token #45

Open clefourrier opened 6 months ago

clefourrier commented 6 months ago

If the tokenizer prepends _ as sow token, it will make single token evals fail. Reported by @anton-l

clefourrier commented 1 month ago

@anton-l do you remember in which case you encountered this?