Labels versus Stages - Githubissues

iterative / gto

🏷️ Git Tag Ops. Turn your Git repository into Artifact Registry or Model Registry.

https://dvc.org/doc/gto

Apache License 2.0

140 stars 16 forks source link

Labels versus Stages #191

Closed aguschin closed 2 years ago

aguschin commented 2 years ago

We've been discussing this already in https://github.com/iterative/gto/issues/68 Also we recently got a user request of adding Labels mechanics in #179

Feedback

I've been asking a couple of question about Stages vs Labels approach in multiple places:

In ods.ai Slack I got 5 votes for Labels and 1 vote for Stages.
In MLOps.community I got this feedback

Suggestion

Considering answers, I think we need to either

Change Stages to Labels (and assume users will implement Stages mechanics by themselves if they want to) (also we have labels in artifacts.yaml already - how to fix this?)
Support both. In this case git tags for Stages could be denoted as rf:prod (giving a hint about dicts and 1:1 relation), while Labels could be denoted as rf#prod.

WDYT @omesser @dmpetrov ?

omesser commented 2 years ago

@aguschin - from the responses I think we can infer that the labels mechanism is largely needed to be "declarative" (reflecting something about the model itself, or even about where it is already deployed), but not necessarily "actionable" - meaning:

stages - I would like the event of state-change to trigger workflows
labels/annotations - I would like the changing of labels/annotations to ONLY help with searching / filtering, not trigger anything

for controlling n:n rel with which model is deployed where (and be more flexible than stages) - we need to have a 3rd and separate concept of:

"deployments" or "environments" The reason I'm not connecting this with stages/labels(=annotations) is that it should be clear / intuitive that this is MEANT to trigger workflows / deployments / checks. We probably can't mix between a declarative mechanism like 2 and this - because people want something lightweight to tag stuff to filter on without triggering anything

What do you think?

shortcipher3 commented 2 years ago

Labels make more sense for my use case. We have models that are shared between different types of hardware and we have a group of beta users that use the models before they go to production. In this case I would like to be able to use gto to trigger a deploy to our beta users hardware, once QA and the beta test are complete I would like to use gto to trigger a deploy to production. Using gto show I would like to see the current state of deployment - something like:

name	latest	#hw1_prod	#hw1_testers	#hw2_prod	#hw3_testers	#staging
model	v1.0.2	v1.0.1	v1.0.1	v1.0.0	v1.0.1	v1.0.2

shortcipher3 commented 2 years ago

When an engineer gets called on to investigate a system, they should know whether it is hardware 1 or hardware 2 and if the problem system is production or part of the beta test group. I would like them to be able to quickly check what version of the model the system should have with gto, sometimes we investigate a problem system that may have failed to update to the proper model, gto could make that issue clear quickly.

I recognize the value in the stages approach, especially if QA is fully automated, the test hardware is only for testing (not a product being used by customers/employees), and only one production environment is supported for a model.

omesser commented 2 years ago

Thanks for the input @shortcipher3 ! This sounds like a clear #3 from my comment here - the physical environment where the models are deployed to. Note that the stages in the approach today are distinct from deployment environment/targets.

And we're trying to figure out whether we want to replace stages with labels, and how they should be used correctly, but the deployment environments themselves, in my opinion, are distinct from those either way and should be managed by the model deployment tool (like mlem) and not gto necessarily.

My gut feeling tells me that something like labels or annotations should be descriptive - i.e. no side effects, since it's gonna be used by humans and for humans, and it's too "loose" a mechanism to control deployments, but great for tagging stuff for search filtering

Take this scenario for example:

I have 2 models: nn1, nn2, and 3 production setups - setup1, setup2, setup3.

When I move a version of either of the models from development stage to integration stage:

✅ there's a series of tests running against the model

Now, when I move some model-version (for either model) from integration stage to production stage:

✅ ✅ there's a larger series of tests running against them, and once that's successful, triggers the next piece of logic:
if model nn1 -> deploy on setup setup1 - because model nn1 is always deployed on setup setup1, the model is built for the app running on this setup.
if model nn2 -> deploy it on setups (setup2 + setup3) - again, those setups are hosting the some instances of the app that works with nn2, not nn1!

Notice there is an assumption here that the affinity between model and setup is "strong" - i.e. not "loose" / controlled by a human deciding to label a model this or that, because the models and the workflows were set-up to always deploy specific models to specific environments. models<>envs are coupled.

In other words - If such a strong affinity between model and setup indeed exists, than stage->setups should be codified in the workflow (the code that is triggered by promoting to stage X) and not by separate gto / model registry command - where a human chooses a setup manually.

🤔 What do you think? Does that make sense?

shortcipher3 commented 2 years ago

@omesser thanks for the response, I looked at mlem before gto and it seems like the goals are a better fit for some pieces of what I'm trying to do, but when I looked into the tutorials it didn't seem very useful.

I'm probably missing some things from mlem or possibly there are gaps in mlem that I was hoping gto would address. I see how I can create the deployment environments, but I don't see how to check what is currently deployed to each deployment environment.

It seems like mlem is mostly built around using python, where I was hoping to use a shell script for one of my models. The model must be run on iot hardware that doesn't have python installed. How do I do a save() from the cli without python? I'm using a web service to train some models and some cli tools to quantize some models. In these cases I don't have the model trained in python, do I just need to create a python script to load the model and run mlem.save()?

I suppose to get back the model so I can use it outside of python I can do an export to get the python package including model file, but this would be a pickled model?, not the tflite or binary model that I can use with other frameworks more easily?

I really like that I can create links between models in git projects in mlem.

It seems like the notion of a deployment is centered around deploying as a web service. In my use case I have millions of iot devices that I want to deploy to and have the models run on device. Resources on the devices are quite limited so I spend a lot of effort getting the model into a format that will work well on a device, spend time testing, send it out to a test group to observe statistics and get user reports, etc. before deploying to customers in production. I'm not sure how to do a lot of that with mlem.

What I like about gto is I can tag models as well as arbitrary binaries (such as builds of the software that serves the models). I like that it is purely a cli tool. I really like gto isn't forcing me to use my models a particular way and that it isn't geared around deploying as a http endpoint. I like that I can easily get a listing of things that I've tagged in a nice tabular format that includes a list of binaries and a list of stages/label.

Our Scenario

One of our scenarios is something like this, we have three hardware types and we have written our software to serve the model. With gto I can tag both the model and the software. Sometimes we update the software and want to send it out to all three types of devices. Other times we make an update that has a feature for one device type, although the code runs on all of the devices we are only interested in deploying to the device that supports that feature, so it only goes through our QA process for that specific device type. So in this case that strong affinity between hardware and software doesn't exist.

Models get tied to software versions, so although we may have swapped out the model the driving force for pushing out the new software version was the change in features. So the model version may get updated on one of our hardwares before it gets out to the other.

Not all of our QA is automated and it takes a lot of time to get something through QA, so although I would love to have everything go through QA at this point it just isn't feasible.

I hope that gives some more color. Hope I didn't take you on too many tangents, but would love to make mlem work for my use case if it just means understanding the features a little better.

aguschin commented 2 years ago

Thanks for the detailed answer, @shortcipher3! That's very interesting.

I'm probably missing some things from mlem or possibly there are gaps in mlem that I was hoping gto would address. I see how I can create the deployment environments, but I don't see how to check what is currently deployed to each deployment environment.

This is something we're going to add to MLEM this year. The idea is, if you have models deployed to some "setups" (MLEM Environments in this case), you'll have mlem deployment status or something that will poll all known deployments and output them with their status. Taking your example above, I guess the output will look like this:

name	latest	#hw1_prod	#hw1_testers	#hw2_prod	#hw3_testers	#staging
model	v1.0.2	v1.0.1 [running]	v1.0.1 [running]	v1.0.0 [failed]	v1.0.1 [running]	v1.0.2 [request timeout]

That said, this example currently extends to deploying model to Heroku, let's say. Not the scenario you describe.

Still, there are MLEM plugins, that can enable any kind of deployment actually. Let's suppose your deployment happens this way:

You train model somewhere (you don't use MLEM to save it)
You export your model to .onnx
You create git tag with GTO that supposed to deploy your model to some environment

If we implement MLEM plugin that could deploy .onnx to your specific env (and serve it with any framework you want I think) and poll the service to know everything is ok, then it would look like this:

and 2. are the same.
You import your .onnx model into MLEM
You deploy that model with MLEM to your specific env manually. Or set up CI that reacts on GTO promotions (implies strong affinity or some user-input though).
Then you run $ mlem deployments status --project https://github.com/user/repo and get that nice table with actual deployment statuses.

Did I get your case right? Does it look like an adequate solution?

TBH, I think this can work, but still looks like a complex solution if someone needs to just trigger deployments with git tags with Labels mechanism, and doesn't care about whether it's actually deployed or not, cause he's checking some other source for that information. cc @omesser

What I like about gto is I can tag models as well as arbitrary binaries (such as builds of the software that serves the models). I like that it is purely a cli tool. I really like gto isn't forcing me to use my models a particular way and that it isn't geared around deploying as a http endpoint. I like that I can easily get a listing of things that I've tagged in a nice tabular format that includes a list of binaries and a list of stages/label.

MLEM can also be a CLI-only tool I think, if we publish it as a .deb package let's say. Will it work for you? What OS is running on your iot devices? What package manager do you use there?

I really like that I can create links between models in git projects in mlem.

Could you please share some feedback for this? How do you intend to use those?

shortcipher3 commented 2 years ago

This is something we're going to add to MLEM this year. The idea is, if you have models deployed to some "setups" (MLEM Environments in this case), you'll have mlem deployment status or something that will poll all known deployments and output them with their status. Taking your example above, I guess the output will look like this:

name latest #hw1_prod #hw1_testers #hw2_prod #hw3_testers #staging

model v1.0.2 v1.0.1 [running] v1.0.1 [running] v1.0.0 [failed] v1.0.1 [running] v1.0.2 [request timeout]

That said, this example currently extends to deploying model to Heroku, let's say. Not t

Yeah, in my case I might have a million hardware 1 in production, some which could be unplugged or have internet problems at any given moment, so querying the devices doesn't seem like what I want, if I were to do that I might want something instead like a percentage of devices reporting a given firmware. Since devices can go offline it might be nice to look at the last reported firmware or devices reporting in the last day, etc.

Did I get your case right? Does it look like an adequate solution?

The example you gave looks good, but am I restricted to .onnx or can it be arbitrary files? I am using a lot of tflite and a custom proprietary format for a hardware accelerator.

MLEM can also be a CLI-only tool I think, if we publish it as a .deb package let's say. Will it work for you?

I have the mlem cli tool on my m1 mac, but I don't see how to add a model without using python? Maybe that isn't supported yet, and will maybe work with the workflow you described above. In its current state it either doesn't work for me or I don't know enough to make it work.

What OS is running on your iot devices?

I'm not thinking that I will be using mlem on the devices, it would be a big lift to change our services to use something like mlem. That said we are using a flavor of linux related to fedora so an rpm would be better than a deb if we were to use mlem on device.

What package manager do you use there?

We are cross-compiling packages and have some custom updates that we provide ourselves, at this point we aren't actually using a package manager and it would be a big lift to add support for that. We have a custom deployment service that I wasn't looking to replace, but perhaps to trigger when QA has approved a firmware.

Could you please share some feedback for this? How do you intend to use those?

We have a service we've written which takes our models and creates an ml pipeline that includes things like detection, tracking, classification, motion detection, etc. Currently we are committing the models we use to the repo, this makes it hard to trace back when it was trained, who trained it, what data they used, etc. I'm experimenting with dvc to store a lot of that information in a separate repo for training/evaluation. I would then like a way to traceably copy the model into my software service repo. I'm entertaining using git submodules, but I like the idea of just linking the actual model file rather than a repo that has a bunch of things that aren't useful for the inference side of things.

aguschin commented 2 years ago

The example you gave looks good, but am I restricted to .onnx or can it be arbitrary files? I am using a lot of tflite and a custom proprietary format for a hardware accelerator.

No, it isn't restricted to .onnx, but it will require to implement MLEM plugins to work with them.

I have the mlem cli tool on my m1 mac, but I don't see how to add a model without using python?

There is $ mlem import cmd https://mlem.ai/doc/command-reference/import. It doesn't support importing onnx or tflite at the moment, but will.

UPD: @omesser, maybe we could introduce in MLEM some "dummy" concept of deployments to support the use case when a user doesn't want to have a real deployment but wants to mark a model as deployed somewhere? Looks to me like an extra step though yet, using git tags as labels is a more plain approach.

omesser commented 2 years ago

@aguschin Sounds like a dummy deployment implementation in mlem would be a bit artificial - the user will still have to introduce mlem to their workflow, seemingly without a good reason, and we will have a concept in mlem that doesn't make sense to people actually deploying models with mlem 🤔 .

I'm entertaining the idea of having an "environment" concept in GTO (In addition to mlem. Yes, overlapping concepts... 😭 ). Maybe we can come up with something simple that would provide a consistent GTO CLI experience e2e and would wrap around mlem environments as well (as extension) to some native gto simple implementation, to support an e2e gto workflow. wdyt?

aguschin commented 2 years ago

I don't like the overlapping concepts either, and I don't like having two mechanics in GTO, because it complicates things, but if we cannot support user scenarios with a single concept, I guess we may need to have both.

A possible scenario, in this case, looks like this for me:

When you start working with GTO, you decide, what you want to use - Stages or Labels (I doubt if you need both in the majority of situations)
If you choose Labels, then in a naive approach, for each GTO Label you are assumed to have the same MLEM env
When you create a GTO Label (git tag) and push it to repo, CI starts
The naive approach allows you to parse the git tag and reuse it "as is". E.g. if it's "rf:prod:1", then you ask MLEM to deploy a MLEM model in path "rf" to a MLEM Env "prod"
Another level of indirection here (We let the user work with this indirection by himself.) is a. if GTO model named "rf" have a different path in repo (e.g. "models/random-forest"). b. if GTO Label is called "prod", but MLEM env is called differently (e.g. "production"). Or GTO label "prod" matches to multiple MLEM envs (e.g. "prod-heroku", "prod-sagemaker").

The other question is what we will see in Studio. With this approach, it looks like eventually we would have to have "Stages", "Labels", and "Deployments" columns here?

aguschin commented 2 years ago

Hi @shortcipher3! We're decided to introduce "labels" mechanics we've been discussing. I'm updating README and the code in this PR #218. If you could check it out and provide your feedback, that would be great!

omesser commented 2 years ago

@shortcipher3 - as you can maybe see from #218 Now you would (🤞 ) be able to use the labels to represent environments/deployments without a dedicated environments concept, but you'll be responsible to unlabeling as well to make it behave logically - e.g.

you deploy model v1.0.0 on #hw1_testers -> you label v1.0.0 with #hw1_testers
you deploy model v1.0.1 on #hw1_testers -> you unlabel v1.0.0 #hw1_testers + you label v1.0.1 with #hw1_testers (to represent the overriding property of deploying on the same hw/set of hw targets)

shortcipher3 commented 2 years ago

This is a better fit for my use case, thanks for the updates. I'll work with it and see how I like it.