Any benefit to choosing something other than wikitext for code models?

turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

MIT License

3.2k stars 236 forks source link

Any benefit to choosing something other than wikitext for code models? #323

Closed irthomasthomas closed 5 months ago

irthomasthomas commented 5 months ago

Sorry to open an issue, but just a quick question really, when quantizing code models, would it be better to use a code oriented dataset to test against, assuming its in the model?

Thanks!

turboderp commented 5 months ago

If you mean for calibration, I'd still use the builtin dataset for that as well. It does contain quite a lot of code, too.

As for testing, I would test on what's most relevant to how the model is going to be used. Code makes sense for a code model, as long as you're pretty sure it's a representative sample. It's definitely risky to test just on wikitext if you want the model to also handle code well. Etc.

irthomasthomas commented 5 months ago

If you mean for calibration, I'd still use the builtin dataset for that as well. It does contain quite a lot of code, too.

As for testing, I would test on what's most relevant to how the model is going to be used. Code makes sense for a code model, as long as you're pretty sure it's a representative sample. It's definitely risky to test just on wikitext if you want the model to also handle code well. Etc.

I see, thank you for the help.