instructlab / ui

Place to hack on UI for InstructLab
Apache License 2.0
11 stars 20 forks source link

Support Dynamic taxonomies (QNA document generators) #140

Open booxter opened 5 months ago

booxter commented 5 months ago

This is a new feature request.

Status Quo

Right now, taxonomies are defined by putting answers in qna.yaml files. This is human work and at times requires domain knowledge AND a good eye to spot typos and other mistakes. For some tasks, it may be beneficial to rely on a program to produce QNA to then feed it into the model for synthetic generation.

Some examples of tasks that could benefit from programmable approach to generate seed samples :

Proposal

In addition to qna.yaml files directly stored on disc, also allow to define taxonomies as programs that, when executed, produce a qna.yaml document that complies with the Instruct Lab taxonomy format.

My attempt at implementing both cli and taxonomy bits for this feature (currently closed but I am happy to revive and rebase):

(old)

(new drafts - rebased old code against current main)

Considerations

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had activity within 90 days. It will be automatically closed if no further activity occurs within 30 days.

nathan-weinberg commented 1 month ago

@kelbrown20 @juliadenham @ktam3 wonder if this is something we want to roadmap anytime soon (e.g. 0.20.0) or is more of a backlog item - either way I think it's worth keeping open