mosaicml / llm-foundry

LLM training code for Databricks foundation models
https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
Apache License 2.0
3.84k stars 503 forks source link

Add registry for ICL datasets #1252

Closed sanjari-orb closed 3 weeks ago

sanjari-orb commented 1 month ago

Purpose of PR: Create a registry for ICL eval dataset types. This will allow users to create custom in-context learning datasets and add them to the registry to run custom ICL evaluations during training.

dakinggg commented 1 month ago

Hi @sanjari-orb could you please add a PR description describing the change? Thank you!

sanjari-orb commented 1 month ago

Hi @dakinggg Sorry this was still a draft because I was still trying to get it to work. But thanks for the comments, I'll take them into account and update the PR soon.

dakinggg commented 1 month ago

No worries, thanks for the contribution!

sanjari-orb commented 3 weeks ago

Hi @dakinggg could you point me to the steps to run the unit tests locally?

dakinggg commented 3 weeks ago

Please see the makefile here (https://github.com/mosaicml/llm-foundry/blob/main/Makefile). Sorry there aren't better instructions!

CPU tests

make test

Multi CPU tests

make test-dist

Single GPU tests

make test-gpu

Multi GPU tests

make test-dist-gpu

sanjari-orb commented 3 weeks ago

@dakinggg Is there a linter I can use to fix the code quality checks?

dakinggg commented 3 weeks ago

@sanjari-orb yeah, running pre-commit should do it

sanjari-orb commented 3 weeks ago

@dakinggg I'm not sure why the PR GPU tests failed: https://github.com/mosaicml/llm-foundry/actions/runs/9514022193/job/26225300100?pr=1252. Could you take a look?