Extension complains about dataset not from `kedro-datasets`

astrojuanlu commented 1 month ago

Spotted this with https://github.com/Galileo-Galilei/kedro-mlflow kedro_mlflow.io.models.MlflowModelTrackingDataset

Not sure if this should be fixed on the extension or on the schema.

noklam commented 1 month ago

Good question! The schema was originally a local setting where you can point to a file (and thus you can edit the file to include new datasets). This is now not possible.Ideally I would like it to version against kedro-datasets, but it's not possible to ship different version of schema with the extension.

I would love to access the API of https://github.com/redhat-developer/vscode-yaml so I can potentially support these in a flexible way, unfortunately it's not well documented so I didn't pursue.

The current solution relies on RedHat's YAML extension. There are two other options:

Use the native schema validation vscode support - https://github.com/github/vscode-github-actions/blob/main/language/syntaxes/expressions.tmGrammar.json . Examples are Github Action extension, which seems to give much better editing experience when I try. This requires understanding how the grammar works and writing lots of regex. Maybe most of them can be copy and paste, but I am not sure at this point.
LSP - we can explore other options where the schema validation happens dynamically and the LSP will give a response directly. We may need a "validation" method on the datasets or some kind of pydantic like model validation and relies on their error message.

khub41 commented 1 month ago

Having this problem too when using custom datasets from a another package Capture d’écran 2024-05-13 à 15 44 20

michal-mmm commented 1 month ago

The problem also applies to YAML anchors/aliases:

_spark_parquet: &spark_parquet
  type: spark.SparkDataset
  save_args:
    mode: overwrite

astrojuanlu commented 1 month ago

Thanks @michal-mmm, I can reproduce.

Entries starting with underscore aren't treated as datasets in general (couldn't find it documented @noklam?)

kedro-org / vscode-kedro

Extension complains about dataset not from `kedro-datasets` #5