Closed merelcht closed 1 year ago
I think the handling of the metadata
attribute should happen on the individual datasets. Any processing of the info in the metadata
will be done by external plugins and not the Kedro framework itself.
The only complication here is that layers
are processed inside the DataCatalog
. It used to be on the individual datasets as well, but was moved to the catalog later on: https://github.com/quantumblacklabs/private-kedro/pull/548/files
If layers
needs to go under metadata like this:
my_dataset:
....
metadata:
kedro-viz:
layer: raw
We'll need to ensure they're still processed correctly and also make sure the implementation is backwards compatible with having layers outside the metadata key.
After discussion with @merelcht and @AntonyMilneQB the implementation should be as follows:
metadata
, being a top-level key, will be introduced as an attribute on the individual datasetslayer
attribute will move to within metadata -> kedro-viz and will no longer be handled by Kedro's DataCatalog - reversing the decision made as part of https://github.com/quantumblacklabs/private-kedro/pull/548. As a result the logic for resolving this will need to move to kedro-viz (as currently they are the consumers of layer
, not Kedro itself)layer
as a top-level key will be a breaking change forcing the implementation to happen in two stages: the first handling the nested layer
on Viz, adding deprecation warnings to the top-level layer
(non-breaking), and the second being the removal of the top-level layer
entirelyThese require further subtasks that I will define in separate issues, but I would like to get @idanov's and @rashidakanchwala's opinions here first.
layer
from metadata->kedro-viz->layer before catalog.layer
& move over this checklayer
attribute on Kedro framework
layer
is defined, or upon both definition and access?layer
attribute (breaking) and the logic introduced to Viz in step 2layer
moved to the DataCatlog originally (in the ticket above)? Is there something that hasn't been addressed?layer
attribute?Alternatively, we could just keep layer where it is and not process it if defined within the metadata. This doesn't address the DataCatalog processing an attribute that isn't used within Kedro, but allows for the introduction of the metadata
attribute to be made without any additional implementation on the Kedro framework.
Questions to consider: - Why was layer moved to the DataCatlog originally (in the ticket above)? Is there something that hasn't been addressed? This is just my assumption but layers have been on Kedro-viz since 2020 and prior to this there was no other extra information that was passed in catalog from kedro project to kedro-viz, that's why probably there was no need to have it nested? - not a great idea for sure.
Are there any oversights on moving the handling of the layer attribute to Viz? Let me revert on this.
Are there any other users of the top-level layer attribute? I doubt, it seems very specific to Kedro-viz.
Currently the layer logic resides in Kedro Framework in DataCatalog (https://github.com/kedro-org/kedro/blob/main/kedro/io/data_catalog.py#L270-L275). Kedro basically sends to kedro-viz - all the layers as a dict in the format below
dict_items([('raw', {'companies', 'shuttles', 'reviews'}), ('intermediate', {'ingestion.int_typed_companies', 'ingestion.int_typed_shuttles@pandas2', 'ingestion.int_typed_reviews', 'ingestion.int_typed_shuttles@pandas1'}), ('primary', {'prm_spine_table', 'prm_shuttle_company_reviews'}) .... )])
On kedro-viz we simply read the above, and match the dataset name to the key it belongs to. (https://github.com/kedro-org/kedro-viz/blob/main/package/kedro_viz/data_access/repositories/catalog.py#L40-L47)
I suppose now this will change so Kedro will only send a dict called metadata to Kedro-viz. And Kedro-viz will extract all the layer information and map it correctly.
This is fine but I am not sure how to make it backward compatible in a clean way @merelcht @AntonyMilneQB ?
Description
Implement the feature to allow users to add new attributes to datasets. Use the syntax decided in #2439
Context
Sub-task of https://github.com/kedro-org/kedro/issues/1076
To be done after: Decide on syntax to allow adding new attributes#2439
Open question
Where should the
metadata
(or other name) attribute go?AbstractDataSet
DataCatalog
like layers?