astrojuanlu commented 4 months ago

See #3541.

Description

Development notes

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

[ ] Read the contributing guidelines
[ ] Signed off each commit with a Developer Certificate of Origin (DCO)
[ ] Opened this PR as a 'Draft Pull Request' if it is work-in-progress
[ ] Updated the documentation to reflect the code changes
[ ] Added a description of this change in the RELEASE.md file
[ ] Added tests to cover my changes
[ ] Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

astrojuanlu commented 4 months ago

Preview: https://kedro--3856.org.readthedocs.build/en/3856/integrations/mlflow.html

astrojuanlu commented 4 months ago

Note to reviewers:

The idea of this page is to complement kedro-mlflow documentation, and in fact it contains several references to it. What this adds then is:

References to MLflow in the Kedro docs so users can easily find it
Short examples on how to create custom integrations using hooks
Fill some gaps in kedro-mlflow

The idea is for this page to serve as brief collection of MLOps use cases, and to use this as a template for future integrations.

I'm paging @stichbury as well because I was somewhat careless with the prose in certain parts.

I ~~confess~~ disclose that Lilli wrote the first 2 paragraphs.

astrojuanlu commented 4 months ago

Is it possible to keep kedro-mlflow as its own section? The current structure is by feature, "tracking", "artifact". The rest of our docs usually start from "basic" -> "advance".

I went back and forth several times on how to structure this page. I like how I ended up making 1 section per use case but I agree there might be other ways. Awaiting for @stichbury's take on this.

stichbury commented 4 months ago

I went back and forth several times on how to structure this page. I like how I ended up making 1 section per use case but I agree there might be other ways. Awaiting for @stichbury's take on this.

I found the sectioning useful but agree with @noklam that we usually break into simple and advanced usage. Can I suggest we do the same here and have the structure as follows, but will leave you to decide where the simple/advanced sections fall?

Maybe something like this?

However, I'm not that attached to this and if you want to stick with what you have, I'd say that's fine, but omit the basic second level "Use cases" header and promote all the following (currently 3rd level) to 2nd level.

Header

Prerequisites

Simple use cases

Tracking Kedro pipeline runs in MLflow using Hooks

Artifact tracking in MLflow using hooks

Advanced use cases

Complete tracking of Kedro runs in MLflow using `kedro-mlflow`

Tracking Kedro in MLflow using the Python API

Artifact tracking in MLflow using `kedro-mlflow`

Model registry in MLflow using `kedro-mlflow`

astrojuanlu commented 4 months ago

My biggest gripe with this is that it seems wrong to declare custom hooks as "basic" and using kedro-mlflow, which is objectively fewer lines of code (just a pip install kedro-mlflow away), "advanced". If anything, the former are more "custom", "ad-hoc", or "homegrown", whereas the latter is more "off-the-shelf".

I can totally see how someone starts with the custom hook ("basic"), then they start making it more complex because they need more functionality, and in the end it becomes way more difficult than just pip install kedro-mlflow and let the plugin take care of it for you, assuming the plugin does more or less exactly what you want to do.

stichbury commented 4 months ago

I can't work on your branch so I've forked and made a PR to commit back to it https://github.com/kedro-org/kedro/pull/3862

Please take a look, merge what you want, and I can review again when you have the entire page in your preferred final state (see comment about sectioning above).

astrojuanlu commented 4 months ago

I'd say that's fine, but omit the basic second level "Use cases" header and promote all the following (currently 3rd level) to 2nd level.

We agreed to do this 👍🏼 Will make the change today

astrojuanlu commented 4 months ago

Thanks for the review @Galileo-Galilei 🙏🏼 will address your comments ASAP.

astrojuanlu commented 4 months ago

I significantly reworked the order of the sections, but the content is largely the same. I think the flow is much nicer now - wouldn't have reached this stage without @Galileo-Galilei's insightful comments.

Please do have a look again.

astrojuanlu commented 4 months ago

Preview: https://kedro--3856.org.readthedocs.build/en/3856/integrations/mlflow.html

astrojuanlu commented 3 months ago

@Galileo-Galilei in the interest of putting this in front of users already and given that I addressed your initial comments, I'm going ahead with merging this - but if you spot any glaring mistakes or further areas for improvement in a post-merge review, do leave a comment and I'll send another PR with the amendments 🙏🏼

Galileo-Galilei commented 3 months ago

Sorry Juan, readthedocs preview did not render well in my phone for some unknown reasons so it was hard to review. It does work on the latest branch though, so I had a chance to read it. It's much easier to read now, thanks for the work!

Two minor comments I'll address in a further PR:

it feels a bad idea to customize the name of the local backend store, because there is 99% chances that user will forget to add it to gitignore and will leak the data. The default mlruns is already in kedro's template .gitignore
if you don't customize the name, you do not need a custom mlflow.yml, hence it simplifies the setup.

astrojuanlu commented 3 months ago

Thanks!

The default mlruns is already in kedro's template .gitignore

Oh, I remembered https://github.com/kedro-org/kedro/pull/3765, and that we didn't follow suit with the starters...

About the customisation, I was hesitating about it myself, see https://github.com/kedro-org/kedro/pull/3856#discussion_r1625992865 I'd rather want to avoid mlrun appearing in a Ctrl+K/F of our docs for the time being, but I see your point around defaults. Maybe we can add mlflow_runs to our .gitignores?

Galileo-Galilei commented 3 months ago

I really think this is a bad idea to create our own standard for many reasons:

MLflow tries hard to create a local mlruns folder : if one use the debugger, runs the kedro pipeline before using the UI command, use runs a node interactively (e.g. in a notebook)... without having the proper setup he will likely ends ups with both an auto created mlruns folder and your custom mlflow_runs. He will see only partials runs in the UI, and eventually we will end up having some questions in slack because of this weird setup.
some teams have their own template, and for sure mlflow_runs is not inside their .gitignore ;)
eventually, some users can seek help outside the kedro community, and the setup will confuse coworkers / stackoverflow members...

I don't like the name either, but the point of having a standard is having everyone sticking to it and I think we should keeo this well established one (and I don't think having mlrun in the docs is a big deals, mlflow users usually know about mlruns folder)

astrojuanlu commented 3 months ago

I see your points, fair enough. @Galileo-Galilei do you want to send the PR yourself?

kedro-org / kedro

Describe integration with MLflow #3856

Description

Development notes

Developer Certificate of Origin

Checklist

Header

Prerequisites

Simple use cases

Tracking Kedro pipeline runs in MLflow using Hooks

Artifact tracking in MLflow using hooks

Advanced use cases

Complete tracking of Kedro runs in MLflow using `kedro-mlflow`

Tracking Kedro in MLflow using the Python API

Artifact tracking in MLflow using `kedro-mlflow`

Model registry in MLflow using `kedro-mlflow`

kedro-org / kedro

Describe integration with MLflow #3856

Description

Development notes

Developer Certificate of Origin

Checklist

Header

Prerequisites

Simple use cases

Tracking Kedro pipeline runs in MLflow using Hooks

Artifact tracking in MLflow using hooks

Advanced use cases

Complete tracking of Kedro runs in MLflow using kedro-mlflow

Tracking Kedro in MLflow using the Python API

Artifact tracking in MLflow using kedro-mlflow

Model registry in MLflow using kedro-mlflow

Complete tracking of Kedro runs in MLflow using `kedro-mlflow`

Artifact tracking in MLflow using `kedro-mlflow`

Model registry in MLflow using `kedro-mlflow`