In the conversion to use new APIs, we added a "simple" pipeline intended to work on small environments (laptops, etc). The document chunking behavior was left as a TODO item and is probably a regression. It needs to be revisited before releasing a new version of the library.
In the conversion to use new APIs, we added a "simple" pipeline intended to work on small environments (laptops, etc). The document chunking behavior was left as a TODO item and is probably a regression. It needs to be revisited before releasing a new version of the library.
https://github.com/instructlab/sdg/blob/1f71fb67aa46151bc0362e733b06529a1c609e6d/src/instructlab/sdg/generate_data.py#L293-L301
Follow-up to #46