Closed ElenaKhaustova closed 2 weeks ago
I wonder if we could auto-fix this with a bug warning
I have a couple questions on this one:
kedro_init_version
that limits what version of Kedro can be used. If, say, someone created a project with Kedro 0.18 (uppercase S
datasets from kedro.extras
) and then tried to use Kedro 0.19 (no kedro.extras
at all, need to install kedro-datasets
), they would get an error, right?
kedro-datasets
is not limited by kedro_init_version
kedro-lint
") or semi-automatic migration utils ("kedro-modernize
") to help with this task, rather than limiting ourselves to improving the traceback?I have a couple questions on this one:
As far as I understand, projects created with our starters have a
kedro_init_version
that limits what version of Kedro can be used. If, say, someone created a project with Kedro 0.18 (uppercaseS
datasets fromkedro.extras
) and then tried to use Kedro 0.19 (nokedro.extras
at all, need to installkedro-datasets
), they would get an error, right?
- I also reckon though that the versioning of
kedro-datasets
is not limited bykedro_init_version
- In other words, how does this problem manifest itself nowadays? What sequence of steps gets us to here?
- It is well known that upgrading a Kedro version is hard in general (but I could not locate an issue for it). By looking at this problem from that angle, and considering that clearly it arises from people not reading our existing migration guides, can we provide linters ("
kedro-lint
") or semi-automatic migration utils ("kedro-modernize
") to help with this task, rather than limiting ourselves to improving the traceback?
So far, we know that this is still happening when users already have Kedro project created for the older version but upgrading Kedro to a newer version. Another reason that was mentioned by interviewees is that our old blog posts have examples with old naming, which is fair because some time ago, it was relevant. But some of them still follow those examples and get confused.
I've also requested some extra details from the user side to better answer your questions.
@astrojuanlu the blog post mentioned above: https://kedro.org/blog/add-kedro-to-your-data-science-notebook
Very good point about old training material using the old names, didn't think about that... This might be a problem that will need some time to go away then, and we might indeed need to take some action on our side.
Looking at the error:
DatasetError: An exception occurred when parsing config for dataset 'companies':
Class 'pandas.CSVDataSet' not found, is this a typo?
I would still argue that the error isn't confusing, it states exactly what the problem is: spelling DataSet
with a capital S
instead of lower case s
, which is indeed a typo. Now the question is whether we can add some additional clarification so that people check that lower/upper-case spelling. At the same time, it will be tricky to do specific matching for DataSet
endings, because the user could have custom datasets that have that spelling and work fine.
Description
There is confusion between
DataSet
andDataset
terminology, and the error message is not informative when using old naming. They have been renamed in 0.19, but people miss that fact when switching to the new version.Relates to https://github.com/kedro-org/kedro/issues/2401
Context
Example of the current error message: