Open booxter opened 5 months ago
This issue has been automatically marked as stale because it has not had activity within 90 days. It will be automatically closed if no further activity occurs within 30 days.
@kelbrown20 @juliadenham @ktam3 wonder if this is something we want to roadmap anytime soon (e.g. 0.20.0
) or is more of a backlog item - either way I think it's worth keeping open
This is a new feature request.
Status Quo
Right now, taxonomies are defined by putting answers in
qna.yaml
files. This is human work and at times requires domain knowledge AND a good eye to spot typos and other mistakes. For some tasks, it may be beneficial to rely on a program to produce QNA to then feed it into the model for synthetic generation.Some examples of tasks that could benefit from programmable approach to generate seed samples :
Proposal
In addition to
qna.yaml
files directly stored on disc, also allow to define taxonomies as programs that, when executed, produce aqna.yaml
document that complies with the Instruct Lab taxonomy format.My attempt at implementing both cli and taxonomy bits for this feature (currently closed but I am happy to revive and rebase):
(old)
(new drafts - rebased old code against current main)
Considerations
qna.yaml
may or may not be stored in the taxonomy repo. (I'd prefer to not store it since it's directly derived from the program.)Dockerfile
s /Containerfile
s, plus defining some basic operational interface (e.g. how input can be fed into the container command entrypoint, and how resulting QNA document is returned from the program. One suggestion could be passing both through a volume mount.)Dockerfile
s should use "official" / "proven" base images.