kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.99k stars 904 forks source link

Describe how to use custom code with Kedro #3418

Open astrojuanlu opened 11 months ago

astrojuanlu commented 11 months ago

Description

In general, i struggle to find any documentation on how to place/use/import custom libraries with kedro

https://linen-slack.kedro.org/t/16143547/hey-all-very-new-and-excited-user-of-kedro-here-hopefully-i-#600585f1-60b6-48e3-b0b6-6d1efe01480a

Context

This is natural for people who understand that Kedro projects are mostly Python libraries, but

  1. This was not the case in Kedro<0.19, and
  2. The fact that kedro run injects the project source into the path on startup hides the fact that users can, and should, install their own code.

https://github.com/kedro-org/kedro/blob/44a3d170fc15be2139f8fafcc647431c5f506aeb/kedro/framework/startup.py#L147-L148

stichbury commented 11 months ago

@astrojuanlu If possible, pleasecould you add a bit more detail of where this content should go in the docset and who could contribute it. It looks like something that engineering should write at first draft. What's your priority here -- high-ish?

astrojuanlu commented 11 months ago

I'd say this is Medium-Low priority, and something that definitely engineering should start.

I'd need a bit more time to propose an outline. There are some loose ends in our user journeys that I think would impact this content:

In addition, it was always weird to have those instructions in the "Package an entire Kedro project" page - kedro package is about distributing the code (wheel + config), docs have nothing to do with it.

Originally posted by @astrojuanlu in https://github.com/kedro-org/kedro/issues/3376#issuecomment-1836354148

We could tackle this task without revisiting these user journeys, but I think we risk introducing some inconsistent information, or making the page too long or complex.

I'll give this some more thought in the near future.

yury-fedotov commented 7 months ago

Starting on this per conversation with @noklam

astrojuanlu commented 4 days ago

We ended up not merging #3852 for a variety of reasons https://github.com/kedro-org/kedro/pull/3852#pullrequestreview-2300738784

There's still appetite to give our users more guidance on how to structure their code so that it doesn't end up in a utils/ package but it's still not clear what is our blessed way.

I also stand by what I said in https://github.com/kedro-org/kedro/issues/3418#issuecomment-1863092016, we'd need to look at our current user journeys. This might require a bit of research. Maybe after we've finished #4116 , given that we're considering adding a -e . in the requirements.txt file.

Let's treat this as a docs issue.