instructlab / sdg

Python library for Synthetic Data Generation
https://pypi.org/project/instructlab-sdg/
Apache License 2.0
19 stars 34 forks source link

[Epic] Ability to resume/continue an SDG cycle #267

Open ktam3 opened 1 month ago

ktam3 commented 1 month ago

Feature Overview (aka. Goal Summary) When running the SDG cycle ("ilab generate") the execution continues until all the provided documents are processed and the number of requested samples is generated.

This Feature card is to enhance InstructLab:

Goals (aka. expected user outcomes) The observable functionality that the user now has due to receiving this feature. Include the anticipated primary user type/persona and which existing features will be expanded. Complete during New status.

Requirements (aka. Acceptance Criteria): A list of specific needs or objectives the feature must deliver to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.

bbrowning commented 1 month ago

A few high level items that need some thought and/or discussion:

Some of the above is complex. If that's too complex to get started, what's our happy path case that we want to ensure works? Is it good enough to assume a user runs a single ilab data generate ..., it gets interrupted, and then they run it again and we continue from where they left off automatically? And if they change any params, docs, config, etc they have to manually blow away the entire state we've saved to start from a clean slate? We've already seen requiring users to manually clean SDG checkpoints between runs can be confusing for them and lead to errors if they forget - are we ok continuing down that path, or is this the right time to make this more friendly?

ktam3 commented 1 month ago

Per discussion with @aakankshaduggal - a higher priority feature has been introduced #271

Though this is still in scope for 1.2, Product Management is aware that team focus is to complete #271 which may cause this feature to slip. @bbrowning may take point in this instead, with Aakanksha helping if she has additional bandwidth