kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.47k stars 875 forks source link

Update error message when catalog entry is invalid #3944

Closed ankatiyar closed 2 weeks ago

ankatiyar commented 3 weeks ago

Description

Fix https://github.com/kedro-org/kedro/issues/3910

Development notes

Add an error message if the catalog entry is not a dictionary - eg

invalid_entry: whatever

Also, when the catalog entry is a dict but type: is missing.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

ankatiyar commented 2 weeks ago

Thank you, @ankatiyar ! The PR looks great. I have one question: Is it okay that we receive a different error if we put just some_value into the catalog without a : after it?

Some offline discussion with @DimedS on this. If there is a catalog entry which is just one word like this-

some_value

This is wrong YAML syntax and it'll error out much earlier when OmegaConfigLoader tries to load the catalog. -

 File "/Users/ankita_katiyar/kedro/kedro/kedro/config/omegaconf_config.py", line 319, in load_and_merge_dir_config
    raise ParserError(
yaml.parser.ParserError: Invalid YAML or JSON file /Users/ankita_katiyar/kedro_projects/demo/conf/base/catalog.yml, unable to read line 73, position 0.

This is not Kedro specific, it simply is just not allowed in YAML afaik. So to me, they seem like two different issues. This PR is to address when a catalog entry is correct syntactically in YAML but "wrong" as per Kedro rules (eg. not a valid dataset) in catalog.yml