instructlab / sdg

Python library for Synthetic Data Generation
Apache License 2.0
15 stars 30 forks source link

Import code from CLI repo #1

Closed russellb closed 4 months ago

russellb commented 4 months ago

Seed this repo with the sdg code from the instructlab/instructlab repository. We should do this in a way that retains the history of those files.

To enable the quickest swap to the library to minimize the time the code exists in 2 places, we should avoid changing it as much as possible for the first release. We can evolve it from there after a 0.1.0 release.

The code in question is in the src/instructlab/generator directory. The interface to that code used by the ilab generate handler is a single generate_data() function. We can keep that as-is for the first release and then evolve the code from there.

russellb commented 4 months ago

This code was imported into the src/sdg directory, including the history of those files. No changes have been made to them since the import.

There is also a Makefile target for checking to see if any changes have been made in instructlab/instructlab to these files since we imported them -- make check.

If this approach ends up not working out, we can always erase the contents of this repo and start over, but for now, this issue is completed.