kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.47k stars 875 forks source link

[DataCatalog]: Autocompletion support for accessing datasets #3914

Open ElenaKhaustova opened 1 month ago

ElenaKhaustova commented 1 month ago

Description

Users struggle to find datasets within the catalog, particularly when dealing with a large number of datasets. They express the need for autocomplete functionality when accessing datasets in the catalog.

We propose implementing autocompletion support for accessing datasets in the catalog, enabling users to receive suggestions for dataset names as they type.

Relates to https://github.com/kedro-org/kedro/issues/1721

Context

astrojuanlu commented 3 weeks ago

Re-stating what I said about dynamic properties (aka "pandas .column access") in #1721:

The problem with doing the dynamic properties is that some dataset names that are valid in YAML would become illegal in that way (same problem as with pandas columns) and also it would pollute the namespace of the DataCatalog (again, same problem)

Is this something that could be addressed with https://github.com/kedro-org/vscode-kedro @noklam ?

noklam commented 3 weeks ago

I think address this with VSCode extension is possible, but I think we should exhaust solutions that work for most of the things first. I know inherit from dict is bad for a reason, but this is almost the most ideal solution that satisfy all my needs. WDYT?

image

I like:

astrojuanlu commented 3 weeks ago

This looks fantastic, and if it works on IPython I'm sure it will work in other places. Wondering if dicts are special-cased or if it's enough for a class to implement __getitem__ and keys().

According to https://stackoverflow.com/a/38732914, it's supported since 2014 https://github.com/ipython/ipython/pull/5304

astrojuanlu commented 3 weeks ago

This does not rely on LSP though, it's a special IPython functionality https://github.com/ipython/ipython/blob/1b4607fbee253a718df14419414f624dfde1164e/IPython/core/completer.py#L2488-L2510

noklam commented 3 weeks ago

Quote from Slack discussion, I think we have a promising solution now! (TypedDict). Is this ready enough to put in a sprint? Do we want to discuss on the API? I proposed one and asked in Slack

image Noted that this is non-breaking and we can add it to the current DataCatalog without introducing one, but of course we also want to align we don't add new API that we are gonna deprecate next.

I suggest we list out all the requirements first, then we can decide whether dict, TypeDict,UserDict or something else is better. Nice find, I find it's also important to test a few different targets (ipython, notebook, vscode, pycharm), my gut feeling is that there are no standard protocol but up to these IDEs to decide. dir is the well known one for attributes . autocompletion, the dictionary [bracket is more mysterious.

astrojuanlu commented 3 weeks ago

If we want IPython and Jupyter autocompletion, there's no need to change the inheritance relationship of the DataCatalog class, it suffices with adding a _ipython_key_completions_() method

https://ipython.readthedocs.io/en/stable/config/integrating.html#tab-completion

See (the code snippet I linked above)

merelcht commented 3 weeks ago

If we want IPython and Jupyter autocompletion, there's no need to change the inheritance relationship of the DataCatalog class, it suffices with adding a _ipython_key_completions_() method

https://ipython.readthedocs.io/en/stable/config/integrating.html#tab-completion

See (the code snippet I linked above)

If we can indeed do it this way, I'm all for it. Changing the DataCatalog to inherit from TypeDict is something we could experiment with for the newly design "DataCatalog2" (for lack of a better name). Right now, I need more clarity on the implications of making DataCatalog inherit from TypeDict and if that also influences the mutability etc.