Closed stichbury closed 1 year ago
@stichbury We've agreed that we should include something like a "quickstart" or "tutorial" in the Kedro docs and then put a reference to more in-depth documentation (ours) at the end. This way it will make our plugins' development cycles uninterrupted and not dependent on the Kedro docs release lifecycle.
How can we proceed on that?
@marrrcin We are still looking at changes to the information architecture, so this is difficult to pin down at present. In the current table of contents, what would you propose? A section in the Kedro plugins page? Or a new section about plugins with tutorials listed? You probably have some great ideas on how to position these in the current layout, which was can take forward as we think about the new one as part of https://github.com/kedro-org/kedro/issues/1866
There is a section called "Deployment" already, it's a good fit for our plugins. Actually some of the parts that are currently included there (e.g. SageMaker) can be replaced with the plugin-based approach.
cc @deepyaman should we raise the priority of this one?
This is in the current sprint w/c 17-04
I've done a little bit of reorganisation on the table of contents in the docs recently, which is unreleased at present, but should go out soon (you can see it in the latest
docs). Let's consider how to make some changes to what we have in the set of deployment docs.
I think each "How to deploy a Kedro project to X" page should have a set of subsections, something along the lines of Introduction, Prerequisites, Deployment process, and Summary. Within those sections, subsections are completely freeform, but it would be good to keep a consistent layout at the top level.
Each of the pages should have a note on when it was last tested (and against which version of Kedro + other prerequisite tools), or at least some indicator of how confident we are in the content.
Where there are two options (e.g. use what we describe or use the Get In Data plugin) we should explain the circumstances that make you prefer one vs the other, or if there's no difference, I'm not sure, but do we need both?
Turning to the deployment targets, I have these so far:
Deployment target/action | Notes | Technical reviewer input |
---|---|---|
Airflow | Existing Airflow docs for QB-supported Airflow plugin These docs are well-structured but I can't speak for correctness Get In Data's kedro-airflow-k8s plugin -- documentation suggests not to use for versions of Kedro > 17.0 ?? |
Reviewer: Is the documentation complete and up to date? How confident are you (green, amber or red)? Do we need both or should we just point through to GetInData docs? |
Argo | Kedro docs are for use without a plugin but also mention/link to an unsupported 3rd party plugin, last updated in summer 2020 | Reviewer: Is the documentation complete and up to date? How confident are you (green, amber or red)? |
AWS Batch | Kedro docs are comprehensive | Reviewer: Is the documentation complete and up to date? How confident are you (green, amber or red)? |
AWS EMR | Written up (as a blog post) but as yet unpublished | This will stay as a blog post for now unless I'm persuaded otherwise, since it's nice to have the technical content. I will make a ticket to expand it and convert to docs though, if this makes sense to reviewers? |
AWS SageMaker | Existing SageMaker docs GetInData have a kedro-sagemaker plugin |
Reviewer: Is the documentation complete and up to date? How confident are you (green, amber or red)? Do we need both or should we just point through to GetInData docs? |
Azure | Battle-tested kedro-azureml plugin from GetInData | Reviewer: Is the documentation complete and up to date? How confident are you (green, amber or red)? |
Dask | Existing Dask docs | Reviewer: Is the documentation complete and up to date? How confident are you (green, amber or red)? |
Kubeflow | Existing Kubeflow Workflows docs kedro-kubeflow plugin from GetInData. |
Reviewer: Is the documentation complete and up to date? How confident are you (green, amber or red)? Do we need both or should we just point through to GetInData docs? |
Prefect | Existing Prefect docs have not been tested with Prefect 2.0 | Reviewer: Is the documentation complete and up to date? How confident are you (green, amber or red)? |
VertexAI | kedro-vertexai plugin from GetInData. | Reviewer: Is the documentation complete and up to date? How confident are you (green, amber or red)? |
OK, I've got a little table going up in the previous comment, to track our confidence and the completeness of various deployment pages.
Please could I ask for some technical help from the usual suspects: @deepyaman @noklam @merelcht @marrrcin @astrojuanlu to answer the 3 questions above and noted in the table:
Feel free to either drop a comment below for anything you want to comment on, or edit the table directly above if you're brave enough/foolish enough to want to wrangle a markdown table.
From your input, I'll build a set of tickets to plan out updates to the deployment content (if not the location in the docs).
Also, another question. Are there any missing targets? We don't have Databricks in this section, for example, but should provide a link to the docs stored elsewhere (and reconsider the distribution of Databricks docs in due course).
My thoughts on the deployment targets listed above (fyi I haven't recently tried any of this so I'm totally guessing if these recommendations still work):
Thanks @merelcht that is amazingly useful.
Given that you're unsure about Airflow, Argo and Batch, I'll ask @deepyaman for a second opinion on those, but TBH, I'm happy to just slate those for an update when there's opportunity (and look at usage logs to see which to prioritise)
My two cents:
kedro<0.18
, we're planning to upgrade it, but no specific timeline for that yet.I agree with Merel mostly, I have some minor comments.
Thanks @marrrcin, that's very useful. I'll take your input on Airflow on board, and likewise for Azure. I plan to add some text for that as you suggest.
And to @noklam also, thank you π I have no idea how I missed AWS Steps. I'll add it to my list, and add it to the flowchart.
Also, we don't have any copy about "Which AWS to use?" but that would be very useful. Let me get that on my list too.
I've revamped the quickstart guide for AzureML here: https://kedro-azureml.readthedocs.io/en/0.4.1/source/03_quickstart.html
I'm a bit late to the party, but regarding Prefect, notice that they're written for 1.x, and Prefect 2.0 changed a few things https://github.com/kedro-org/kedro/issues/2431 so I'd give those an amber rating too π
I will create a pair of tickets for updating the Prefect docs and Airflow/Astronomer docs to the latest versions. And note the version used in the docs so readers are aware.
Following discussion with the GetInData team, we should look to include more official documentation about Kedro plugins for deployment within our official guides.
One option is to add some docs on our side and point through to the docs e.g. https://kedro-azureml.readthedocs.io/en/0.3.6/ for Azure ML (which should probably be the first one, as itβs the most battle tested and feature complete one).
An alternative is that those plugin docs are brought inside our docs entirely (which has the benefit that the user stays on one location and has one style of docs to read) but also adds to the content load, which is already heavy.
I didn't have a ticket about this so have created one for discussion. Tagging in https://github.com/marrrcin