huggingface / lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
MIT License
471 stars 55 forks source link

Relax sentencepiece version #74

Closed lewtun closed 4 months ago

lewtun commented 4 months ago

This PR puts a lower bound on the sentencepiece version to be minimally 0.1.99 so lighteval can be more easily pip installed in other environments that also have transformers installed. If this is OK, I will do something similar with other deps like pytest which ideally shouldn't be pinned (at most upper bounded)

clefourrier commented 4 months ago

Don't you mean at most lower bounded?

lewtun commented 4 months ago

Don't you mean at most lower bounded?

Ah for things like pytest where one doesn't expect breaking changes across minor versions, it can be helpful to put an upper bound on the next major version. But for sentencepiece I used just a lower bound since I'm not sure if they use proper semantic versioning.

Btw if we don't use sentencepiece anywhere outside of transformers, a better approach would be to link it to the transformers dependency via transformers[sentencepiece]

Happy to amend this PR like that if you agree!

clefourrier commented 4 months ago

Thanks for the explanation, makes sense! I think @NathanHB should take the decision on this, as it's likely he'll become the lead maintainer of lighteval in the next few months. And I agree on making sentencepiece a dep of transformers, cool idea!

NathanHB commented 4 months ago

Btw if we don't use sentencepiece anywhere outside of transformers, a better approach would be to link it to the transformers dependency via transformers[sentencepiece]

This sounds good ! I see no reason for not doing that :)

clefourrier commented 4 months ago

@NathanHB what about using requirement bounds instead of fixed version? :)

NathanHB commented 4 months ago

I tend to prefer using fixed versions, as I had issues with breaking changes more than once. However, using bounds for minor version seems better for compatibility with other lib. So Lewis proposition is good for me :)

farzadab commented 1 day ago

Maybe a dumb question, but where is sentencepiece even used? I don't see it imported directly anywhere in the code.

clefourrier commented 15 hours ago

If I remember properly, we had to fix the lower bound of the sentencepiece version because it will be needed/used when loading some tokenizers and prior versions had issues with our codebase.