Closed nstogner closed 1 year ago
See the following implementations of consolidated repos:
https://github.com/substratusai/base-images https://github.com/substratusai/model-loader-images
If we like this direction, we can add to it, otherwise, we can remove.
@samos123 and I discussed this for a few minutes and reached consensus on a monorepo including base images. Rationale:
The implementation I have in mind:
bases/ubuntu/* # Image: substratusai/os-ubuntu
servers/basaran/* # Image: substratusai/server-basaran
model-loaders/huggingface/* # Image: substratusai/model-loader-huggingface
model-trainers/huggingface/* # Image: substratusai/model-trainer-huggingface
dataset-loaders/squad # from substratusai/dataset-squad
Another server emerges when we have santacoder and starcoder working and fine-tuned. Those servers need to implement the language server protocol spec to be used by lsp clients. Likewise, the text-generation-inference would make for a worthwhile server effort.
Questions:
I rather keep it simple without nested directories for container image directories:
substratus-images:
base (helpful for HF loader which doesn't need many dependencies and keeps image small)
base-gpu (maybe this is our only base)
model-loader-huggingface # image substratusai/model-loader-huggingface
model-trainer-huggingface
dataset-loader-k8s-instruct
dataset-loader-squad
Every directory under substratus-images should be a directory with a Dockerfile. By convention, the resulting image will be substratusai/{DIRECTORY_NAME}.
What you all think?
☝️ That's perfectly good by me.
I am good with the monorepo, I would name it github.com/substratusai/images.git
. I prefer to use a single base image for gpu and non-gpu workloads. This might actually help speed things up b/c of base layer caching.
Started on this instead of updating other repos in place
NOTE: images.git is currently dependent upon https://github.com/substratusai/substratus/pull/109
this is mostly done except using same base image for all. Lets track image specific issues in the image repo going forward. Closing the issue here.
Proposal: Consolidate all Substratus image repos for easier maintenance and easier search for users.
NOTE: In this implementation, perhaps the base-images are used for Models, Datasets, and Notebooks, and they contain
notebook.sh
script along with Jupyter pre-installed. This removes the need for a separate notebook image repo.base-images.git
:dataset-images.git
:model-loader-images.git
:model-trainer-images.git
:server-images.git
:Path based triggers could be used to filter Github actions build jobs. For example, when building the facon-7b model: