kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.49k stars 874 forks source link

Make rich optional / not a core dependency of `kedro` #2928

Closed noklam closed 2 days ago

noklam commented 10 months ago

Description

What do you think about pip uninstall rich to fallback to a plain log? Especially for CI environment there’s not much point to have it. @noklam

Joel Schwarzmann I think thats a good idea you can even do the richrepr__ that only works if it’s avaialble

Documenting some conversation in Slack - we should consider making rich non-core of Kedro. Potentially

While we have fixed many issues in this Tweaking Logging to improve UX, particular we introduce RichHandler. However, there are still some unresolved issues with rich. Some of them requires upstream fix instead of kedro.

List of unresolved issues, mostly IDE(notebook, VSCode or PyCharm) specific

rich is mostly useful for interactive development workflow. In most production environment, teminal logs are not colored and rich automatically fallback.

Context

While Rich logging improve the readability, it has been proved hard to make it works 100% the case, there are edge cases in IDE/deployment environment which is out of our control. In some case, if the logging is too long rich automatically split it into multiple lines and cause issue.

In addition, some user prefer plain logging (it's configurable, but user will still have to install rich)

Pro:

Con:

[!NOTE]
This is not our top priority now, but we would love to have feedback and more discussion before we move to technical design or implementation.

Alternative

astrojuanlu commented 8 months ago

Would be great to move this to a kedro-rich plugin, yes.

More broadly, configuring our logging is consistently a pain. This is hardly Kedro's fault, stdlib logging is famously obtuse, but our heavy handed approach to logging exacerbates this problem. Just from this week, a user is confused about logs or lack of them in Kedro, Docker, Airflow https://linen-slack.kedro.org/t/15975835/version-1-disable-existing-loggers-true-formatters-simple-fo#602b2164-1a4f-4e5c-8741-3c3dc29f7be7

An approach worth exploring is moving our logging to structlog https://www.structlog.org/en/stable/configuration.html (which has progressive enhancement if rich is present) and even consider making the library functions more logger-agnostic by passing it as a parameter in key places.

astrojuanlu commented 8 months ago

rich created some problems in older versions of Databricks https://linen-slack.kedro.org/t/16022133/situation-kedro-0-18-8-python-3-8-16-databricks-9-1-lts-we-a#e098da1b-ef71-4ead-9044-d2e4dfe097d9

Possible solutions:

astrojuanlu commented 7 months ago

@datajoely 's kedro-rich prototype keeps receiving attention https://github.com/datajoely/kedro-rich/issues/11

astrojuanlu commented 7 months ago

Unsure if this is a parent issue, because the linked issues are consequences of having rich as mandatory dependency, not a breakdown of tasks needed to get rid of it. I'm renaming accordingly.

datajoely commented 7 months ago

Yeah aligned on the fact we should look to make the dependency optional

Longer term I'm still keen to advocate for the other tasks still on the roadmap such as progress bars: https://github.com/kedro-org/kedro/milestone/15

datajoely commented 7 months ago

And to do the drop in click replacement (which we could do a try/except import) https://github.com/ewels/rich-click

astrojuanlu commented 2 months ago

Commenting here to clarify my view on this issue since it got mentioned in a meeting.

Initially, my idea was that, for now, we should not change what users get when doing pip install kedro. Doing so could be considered a breaking change from the user perspective IMHO.

However, let's say we want to make the Kedro work even after doing pip uninstall rich. If we don't change the project.dependencies array, this will make pip and other tools complain:

❯ pip show kedro
Name: kedro
Version: 0.19.5
Summary: Kedro helps you build production-ready data and analytics pipelines
Home-page: 
Author: Kedro
Author-email: 
License: Apache Software License (Apache 2.0)
Location: /private/tmp/test-kedro-rich/.venv/lib/python3.11/site-packages
Requires: attrs, build, cachetools, click, cookiecutter, dynaconf, fsspec, gitpython, importlib-metadata, importlib-resources, more-itertools, omegaconf, parse, pluggy, pre-commit-hooks, PyYAML, rich, rope, toml
Required-by: 
❯ pip uninstall rich
Found existing installation: rich 13.7.1
Uninstalling rich-13.7.1:
  Would remove:
    /private/tmp/test-kedro-rich/.venv/lib/python3.11/site-packages/rich-13.7.1.dist-info/*
    /private/tmp/test-kedro-rich/.venv/lib/python3.11/site-packages/rich/*
Proceed (Y/n)? y
  Successfully uninstalled rich-13.7.1
❯ pip check
cookiecutter 2.6.0 requires rich, which is not installed.
kedro 0.19.5 requires rich, which is not installed.

And, perhaps most importantly, even if we make kedro not depend on rich... It still depends transitively on it through cookiecutter.

So, I see 2 main blockers:

  1. As long as kedro -> cookiecutter, then this issue cannot be closed because kedro -> cookiecutter -> rich.
  2. Even if we address (1), it's not clear what the best solution to the rich dependency is.

Thoughts @noklam @SajidAlamQB @lrcouto ?

noklam commented 2 months ago

This is a good point, but I think it's a separate discussion from what we were talking.

  1. Make kedro (kedro run specifically ) doesn't depends on rich at runtime (this PR or the discussion above). i.e. "Can I run kedro without importing rich at all"
  2. Make rich not a core dependency of kedro (including pip install etc). i.e. "Can I run kedro without ever installing rich"

Should we focus on 1. first, which is what this PR mainly addressing? So far I think the bigger problem coming from rich isn't because of rich is big or it comes with lots of dependencies, the main complain is it fails in certain platforms and there is no good way to disable it automatically (and user do not have control over it).

The 2. is still a problem, but in my mind a less urgent one until we have a good replacement of cookiecutter. Alternatively we can immediately pin cookiecutter <2.3.0 which doesn't depends on rich at all. (kedro has a pin of cookiecutter >2.1.1 <3.0 atm).

astrojuanlu commented 2 months ago

Agreed to focus on (1) first 👍🏼

astrojuanlu commented 2 months ago

The PR in question is #3838 by the way

noklam commented 1 month ago

@lrcouto is this completely addressed now or still follow up to do?

lrcouto commented 1 month ago

@lrcouto is this completely addressed now or still follow up to do?

There's still follow-up to do. What I implemented here was a first step, but it still requires the user to manually downgrade their cookiecutter version to get Kedro to work without rich. They'll also have to manually uninstall rich.

To make rich completely optional, we'll also have to deal with our dependency on cookiecutter for project creation.

noklam commented 1 month ago

@lrcouto in that case, would you prefer to reopen this issue or creates new one for the follow up tasks you mentioned?

astrojuanlu commented 1 month ago

I'd say let's reopen this issue to signal it's not fully addressed, and we make it a parent of the new one about cookiecutter.

lrcouto commented 1 month ago

I'd say let's reopen this issue to signal it's not fully addressed, and we make it a parent of the new one about cookiecutter.

Yeah, I agree.

lrcouto commented 2 days ago

We will be closing this issue for now, based on this discussion on Kedro's dependencies. We have decided on a longer term approach to deal with Cookiecutter, and by extension Rich, that will take some time to design and implement. This issue will be addressed once we're ongoing with this process.