kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.47k stars 875 forks source link

Clearer underlying dataset issues #3971

Open datajoely opened 3 days ago

datajoely commented 3 days ago

Description

A user reported that Kedro was unable to read the CSV, they get the following logs in AWS: image

The "No columns to parse from file" is being thrown by the underlying pandas implementation in this file

It would be helpful if Kedro could bubble up that the error is thrown in pandas.io.parsers.python_parser so that it is clear where the issue lies. The error above, mentions kedro.io.core.DatasetError is it not possible to do the same?

astrojuanlu commented 22 hours ago

It is unclear why those logs don't show tracebacks.

Anyway, the current implementation of AbstractDataset is responsible for that DatasetError:

https://github.com/kedro-org/kedro/blob/adfc593bcd2f1b74676e7ab7c1a3b9c168b7257f/kedro/io/core.py#L192-L202

datajoely commented 10 hours ago

They must be in the exc object somewhere, I refuse to believe otherwise