Closed ElenaKhaustova closed 5 days ago
Is better documentation enough to address this though?
For example, this was the first comment a user made when joining our Slack:
Hey folks! Just started using kedro. Is there any kedro command to import datasets from a path into my data directory in the project?
(https://linen-slack.kedro.org/t/9703502/hey-folks-just-started-using-kedro-is-there-any-kedro-comman#296704bb-7be1-419c-94b2-2429086acbea, cc @juanmarin00)
In the same way we have kedro pipeline create
, we could have kedro dataset import /tmp/my_data.csv
or something like that, populating the catalog for you.
Also unclear if this is related to the DataCatalog
API itself, but more of a Kedro DX thing in general.
I'd be curious to know what's really meant with "configuring" a datasets. We have a huge amount of docs on yaml
examples: https://docs.kedro.org/en/stable/data/data_catalog_yaml_examples.html, but if that's not what users are looking for then what is it they'd like to see?
@astrojuanlu, @merelcht, what we got from the interviews is that less experienced users are missing the connection between DataCatalog
, Dataset
and the actual python package encapsulated with the specific dataset implementation, aka working with pandas
. When users want to add dataset configuration into the catalog.yml
it's not obvious for some of them that the set of the dataset configuration parameters is defined by its implementation (filepath
, load_args
, etc), but for example load_args
are defined by the underlying library like pandas
.
We can add a small example to the docs to clarify the dependency DataCatalog
-> Dataset
-> underlying library.
Description
Users struggle to understand how to configure datasets properly, resulting in frustration. They miss the existence of the
Kedro-Datasets
component and from theKedro
documentation, they struggle to get on how to set up the parameters for datasets.We propose adding a configuration example with the reference to the https://docs.kedro.org/projects/kedro-datasets/en/kedro-datasets-3.0.1. Specifically how to set up kedro- and dataset-related parameters.
Documentation page (if applicable)
https://docs.kedro.org/en/stable/data/data_catalog.html
Context
"They tend not not know the underlying library connected to the datasets. They need to be redirected to the right place in the documentation (e.g. pandas.CSVDataset API doc)" (C)