[FEATURE] CD for MLCubes

sarthakpati commented 2 months ago

Is your feature request related to a problem? Please describe.

Currently, we are pushing the docker images on every merge [ref], but the MLCubes (either raw or medperf variants) need to be build + pushed separately [ref], which is not ideal.

Describe the solution you'd like

It would be great to have CD for the MLCubes (at least the MedPerf compliant ones for the model and metrics].

Describe alternatives you've considered

N.A.

Additional context

It would be great to have inputs from @mlcommons/medperf-write team for this. From my perspective, there are 2 questions:

How to structure the workflow?
Where should the mlcubes be deployed to?
- SP: my opinion is to have this go to the GitHub GaNDLF packages as separate tags with appropriate versions (#901)

hasan7n commented 1 month ago

Having a CD for this sounds good! Below is a description of a small issue with a possible solution. I am talking about gandlf_metrics mlcube here, but the problem may as well exist for gandlf model mlcubes.

Having a CD for this means that there will be published docker images (say mlcommons/gandlf-metrics:1.2.3). I understand that the aim is to use this docker image directly instead of building a new one from scratch when someone wants to use gandlf metrics mlcube.
Note that such a docker image will expect a specific structure of the input (predictions and labels). In MedPerf's world, this means that the model MLCube (which is run before the metrics mlcube and will generate predictions) needs to save predictions in a specific way that GanDLF comprehends. This is impossible since GaNDLF expects a single input csv file that contains paths to both predictions and labels, while MedPerf's design doesn't allow the model MLCube to access labels, which means the model mlcube cannot generate such an input csv file.
The solution was to customize the gandlf metrics mlcube so that its entrypoint is not gandlf_metrics, but a certain script that will restructure the input predictions and labels first, and then calls the gandlf_metrics command. This is an example.
So, the point that I want to get into is that if we want to have a CD for creating gandlf metrics mlcube, and if our aim is to reuse this built mlcube, we need to come up with a clean/easy solution for the input restructuring.

I think the solution would be to build the gandlf mlcube in a way that it calls a specific entrypoint.py script before it calls gandlf_metrics command. The default entrypoint.py script would be empty. Then, a user that wishes to use gandlf mlcube and wants first to restructure inputs, can build a new docker image that inherits from the one built in CD, and overwrites the entrypoint.py script. Something like:

FROM mlcommons/gandlf-metrics:1.2.3
COPY ./entrypoint.py /workspace/entrypoint.py

How does this sound?

sarthakpati commented 1 month ago

Thanks for the detailed response, @hasan7n! Here are my in-line comments:

Note that such a docker image will expect a specific structure of the input (predictions and labels). In MedPerf's world, this means that the model MLCube (which is run before the metrics mlcube and will generate predictions) needs to save predictions in a specific way that GanDLF comprehends.

This is no longer a requirement, and #900 allows a user to pass the targets and predictions in comma-separated values.

So, the point that I want to get into is that if we want to have a CD for creating gandlf metrics mlcube, and if our aim is to reuse this built mlcube, we need to come up with a clean/easy solution for the input restructuring.

I agree, and this is driven in a large part to ensure that challenge organizers are given with a stable image with which to work on which they do not need to create/maintain.

Then, a user that wishes to use gandlf mlcube and wants first to restructure inputs, can build a new docker image that inherits from the one built in CD, and overwrites the entrypoint.py script.

I am not sure if this is needed after #900. Tagging @VukW for more info.

mlcommons / GaNDLF