instructlab / dev-docs

Developer documents for the InstructLab organization
Apache License 2.0
4 stars 31 forks source link

`ilab` engine proposal #9

Closed cdoern closed 7 months ago

cdoern commented 7 months ago

This enhancement introduces a new design for ilab. Primarily adding sub-parent commands, new modes of interactions with ilab, and clarity on what the source + sink of data are in this system we are building

jeremyeder commented 7 months ago

overall, unless there are conflicting proposals to merge into this one, i think this is a meaningful and logical next step from the MVP capabilities and we should pursue it.

bbrowning commented 7 months ago

Has any thought been given to integrating more deeply with Open Container Initiative (OCI) here? For example, I use Kitops today to store the source taxonomy, generated synthetic data, and generated models I get from ilab directly in an OCI filesystem layout and push/pull to/from OCI registries (quay, docker hub, etc).

There's a lot of overlap between this proposal and what kitops handles today, that would be good to consider as there are good ideas in both. kit has a concept of a ModelKit, which is really just an OCI config that points to different OCI layers with differing semantic meanings and mediatypes. Under the covers it uses Oras to manage the metadata, source code, datasets, and/or generated models as OCI artifacts. Is there an opportunity to combine forces a bit here for a bigger community push around generated data and models as OCI artifacts?

Some examples:

If we go down the OCI route, then we open up a much wider world of registries we can push models and datasets to as many enterprises already have an OCI registry for container images and every cloud provider offers their own SaaS version. Additionally, we open up some potential deeper integration into tools like podman and docker as we consider how someone might want to construct a container image consisting of an inference serve base container image with their generated model layered on top.

I'm not sure kitops gets everything right, but given that OCI and Oras give us a lot of the proposed functionality as a default part of OCI (filesystem layout, push, pull, list, inspect, tag, etc) it seems reasonable to consider.

alimaredia commented 7 months ago

After really thinking through this proposal with @cdoern today, I think the part of this proposal that establishes a hierarchy of commands but does not add any new functionality should be accepted and work should be started as soon as possible.

The new command structure is an upgrade over the existing ilab command, especially for a first time users, and the new command structure give the flexibility to have ilab be an "engine" if we decide to do in future discussions.

Backward compatibility of existing commands should be kept for a pre-determined number of milestones or amount of time that is agreed upon in this enhancement.