Closed astrojuanlu closed 4 days ago
PyCharm now supports LSP https://blog.jetbrains.com/platform/2023/07/lsp-for-plugin-developers/ via @datajoely
@astrojuanlu excited to see this move ever so slightly forward.
The number one feature for me is the link between our "magic string" input/output/parameter references in node definitions and their YAML counterpart in the catalog. IDE users are used to ⌘ Command + Click symbols and jumping to their definition.
kedro-lsp
the majority of the work is spent building a YAML scanner.If we were to store the file/line number reference in the live catalog object we could do exciting things.
I had another user complain about hard navigation/autocomplete/typo detection for dataset names. Creating an IDE plugin as stated in this issue is of course one of the possible solutions, maybe others could be explored.
Maybe worth explore the kedro-lsp in 0.19
On Sat, 9 Sept 2023, 13:59 Juan Luis Cano Rodríguez, < @.***> wrote:
I had another user complain about hard navigation/autocomplete/typo detection for dataset names. Creating an IDE plugin as stated in this issue is of course one of the possible solutions, maybe others could be explored.
— Reply to this email directly, view it on GitHub https://github.com/kedro-org/kedro/issues/2821#issuecomment-1712505860, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELAWLZVWMWY3WNUUN2K2J3XZRR4VANCNFSM6AAAAAA2OUGNHI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Nice work from the DVC folks on their VSCode extension
The pre-requisite for building this is to store the YAML line no/cursor position of the catalog entry at load-time.
For reference, ruamel.yaml
provides the cursor information:
In [11]: from ruamel.yaml import YAML
In [12]: yaml = YAML()
In [13]: data = yaml.load("""
...: # testing line and column based on SO
...: # http://stackoverflow.com/questions/13319067/
...: - key1: item 1
...: key2: item 2
...: - key3: another item 1
...: key4: another item 2
...: """)
In [14]: data
Out[14]: [{'key1': 'item 1', 'key2': 'item 2'}, {'key3': 'another item 1', 'key4': 'another item 2'}]
In [15]: type(data)
Out[15]: ruamel.yaml.comments.CommentedSeq
In [16]: data[0].lc
Out[16]: LineCol(3, 7)
In [17]: type(data[0])
Out[17]: ruamel.yaml.comments.CommentedMap
However:
I think some of this metadate we'd need is hidden by the OmeagaConf.load method
Indeed, OmegaConf.load
I/O is coupled with PyYAML unless an already loaded object is directly provided:
I think this supports the idea of separating the loading from the resolving part as proposed in https://github.com/kedro-org/kedro/issues/2481
Similar case of trying to parse YAML and give good error messages:
The initial version of rattler-build used serde & serde_yaml to parse the recipe. That worked OK, but was limited because we could not get great error messages out of serde_yaml, since it doesn't report the locations (line & column) of the encountered issues.
However, the primary downside for me was the requirement to set up configurations using YAML.I would prefer it to be closed within a Python script because editor completion.
Just saw this issue linked from the slack conversation.
This would solve my number one problem with kedro as a user. I raised this in a user interview some months back, and also in various conversations: the disconnect between string objects in python and the objects which those strings represent within kedro. Refactoring is a huge hassle whenever it involves changing the shape of pipelines as there always end up being orphan datasets.
So this issue has a huge upvote from me.
There are many technical complexity for this, i.e. dynamic generated config/catalog
Nonetheless, I believe it's a huge improvement and we don't necessary wait until we have a full solution. I spent quite a bit of time to look at the original kedro-lsp and finally make something that runs in 0.19.x. I'm pretty excited about this.
The tool tip is awesome here! I guess dataset factories would work the same, I still think it would be nice to have an equivalent of dbt compile
where you could jump to the resolved config.
IDE Plugins are very broad, I have seen a few things mentions here and recalled some discussion in the past:
IDE support:
Backward compatibility: I am not sure yet how to make VSCode plugin that backward compatible, or maybe we shouldn't care this so soon because this can be a good reason to drive more 0.19 adoption.
User Research: (??) - there are many possibilities and unknown here
Honestly I think VS Code is the right call for any initial MVP - you can just point to DVC and Databricks making the same call
Yeah the title of this issue is too broad. Let's start with an LSP for Kedro, basically bringing kedro-lsp
back to life and working with Kedro 0.19 + docs for how to set it up on VSCode. That's already a massive usability boost for folks.
@noklam shall we open a separate issue for it?
https://blog.jetbrains.com/platform/2023/07/lsp-for-plugin-developers/
I did some research today and found that in theory Pycharm support LSPs, but it's only limited to paid users which is disappointing.
Disappointing but I think it's still a great step especially if we target an enterprise user persona
VSCode plugin exists already, see #3691
Considering this done for now, if we ever decide to target other editors we can model those after the existing VS Code extension.
We have evidence that users struggle when assigning catalog entries to node functions. For example, #2726:
This is something that an IDE extension could help with. There is already a kedro-lsp extension (see https://github.com/kedro-org/kedro/issues/712#issuecomment-1172216531), but doesn't appear to be maintained anymore.
Such extension could help with other things, to be defined.
Evidence markers