iterative / example-repos-dev

Source code and generator scripts for example DVC projects
https://dvc.org/doc
21 stars 13 forks source link

example-get-started-experiments: add gitlab workflow #211

Closed dberenbaum closed 1 year ago

dberenbaum commented 1 year ago

There's a working example in https://gitlab.com/iterative.ai/example-get-started-experiments. This doesn't have any of the conditional logic of commit-based vs. manual/workflow dispatch, but otherwise it matches GH as much as I could.

I'm not sure whether it's worth merging now or spending much more time on at the moment because the current level of complexity feels overwhelming and not something I think can be resolved by polishing, since a lot of the complexity is in setting up the CI workflow and cloud runner.

Some examples:

daavoo commented 1 year ago

I'm not sure whether it's worth merging now or spending much more time on at the moment because the current level of complexity feels overwhelming and not something I think can be resolved by polishing, since a lot of the complexity is in setting up the CI workflow and cloud runner.

So, should we just find a way to push to the Gitlab repo after building the example, and consider the Gitlab version to only support cloud experiments? It would still be helpful to check the import, parsing, etc on gitlab

shcheklein commented 1 year ago

I think these examples are great tbh. CI is expected to take time to setup (one time usually). Any help like this is amazing. It solves 80% of the problem for me in case I need to setup it.

shcheklein commented 1 year ago

Do we support BB btw?

daavoo commented 1 year ago

I think these examples are great tbh. CI is expected to take time to setup (one time usually). Any help like this is amazing. It solves 80% of the problem for me in case I need to setup it.

Yes this is helpful and CI is expected to take some effort, but I think it is also fair to consider if all the complexity introduced by the cml-runner approach is actually needed or if we could have simpler approaches to use a cloud instance inside CI

daavoo commented 1 year ago

Do we support BB btw?

The cml-runner doesn't support GPU in BitBucket, as far as I know

shcheklein commented 1 year ago

I think there will be a large group of ppl that would prefer CI (may be not even as a way to run experiments, but for reports, automation, openid support, ability to trigger it ouside Studio, etc). That's why i was thinking to rename it from cml (and the reports column), make it more like setup CI, trigger CI button.

But CI and automation, even possibility of this, is a powerful feature / ability / differentiator. If we can provide basic examples/ show hw it works I think personally it's great.

shcheklein commented 1 year ago

Yep, no BB with GPU, but a basic CI?

Also, btw we need the same "setup Ci" in the model registry. It's much needed. W/o similar guidance we are not getting to an e2e workflow + we don't really test it to feel the pain.

daavoo commented 1 year ago

I think there will be a large group of ppl that would prefer CI (may be not even as a way to run experiments, but for reports, automation, openid support, ability to trigger it ouside Studio, etc). That's why i was thinking to rename it from cml (and the reports column), make it more like setup CI, trigger CI button.

Yes, I agree, I like CI.

But CI and automation, even possibility of this, is a powerful feature / ability / differentiator. If we can provide basic examples/ show hw it works I think personally it's great.

But do you think that whether it is based on a self-hosted runner approach or on a CLI command that dispatches the training and gets back results makes any difference for people wanting the list of things you mention?

dberenbaum commented 1 year ago

Agreed that I'm fine to merge it since we have it already, and it will be helpful for internal testing.

My point was that after spending time on it I'm no longer convinced spending more time on BB or otherwise polishing the current CI workflow is worthwhile. I recall @daavoo saying this when updating the github workflow, and if both of us have that feeling, I think we should reconsider the current approach.

There is such a mix of issues across products, like CML requiring a personal access token and Studio requiring its own token, but neither of those being visible in the gitlab workflow (you store them in the UI as variables and they are implicitly added as environment variables) - it would be near impossible for a user to know about those requirements or several others to get this working on their own IMO. It also contradicts our focus on onboarding and simplicity.

Personally, I would like to immediately make cloud experiments the default workflow even in its rough alpha state (it feels like a major improvement to me even in its current state), and prioritize a narrowly scoped version of https://github.com/iterative/dvc/issues/9461 so that a much simpler CI workflow could be supported soon (and make clear to users this is the plan and it will make CI way simpler). Until then, it feels like there is little point in having a CI workflow because it can only be used if we set it up for someone.

Edit: forgot to mention that CI-specific issues like the lack of GPU on BB mostly go away if we go this direction, and we can truly provide a copy and paste CI template that is simple and works.

Also, btw we need the same "setup Ci" in the model registry. It's much needed. W/o similar guidance we are not getting to an e2e workflow + we don't really test it to feel the pain.

Great point. Do you want to open an issue in studio to track it? Feels like a good next step after the rest api is working. We currently link in the docs to https://github.com/iterative/example-gto/blob/main/.github/workflows/gto-act-on-tags.yml, but it would be good to have it part of this repo and to have some guidance (need to consider whether a wizard would make sense or not) in the model registry UI itself. cc @aguschin

dberenbaum commented 1 year ago

Can we do the changes so we also push to GitLab?

Added instructions in the readme for pushing to gitlab.

daavoo commented 1 year ago

Added in #212