instructlab / sdg

Python library for Synthetic Data Generation
https://pypi.org/project/instructlab-sdg/
Apache License 2.0
23 stars 35 forks source link

Download tokenizer artifacts in CI instead of storing them in `tests/testdata/models` #384

Open khaledsulayman opened 1 week ago

khaledsulayman commented 1 week ago

It was raised in discussion on #364 that we shouldn't be storing tokenizer artifacts in our testdata.

Points in favor

Proposed solution

We download the necessary artifacts during CI runs in order to run tests