Open SajidAlamQB opened 2 weeks ago
Hey @SajidAlamQB, I am happy to work on this. Can you provide some information on what kind of metadata would need to be retrieved exactly? Is there a list?
Hi @MinuraPunchihewa,
Thank you for your response and for offering to work on this!
I think some of the metadata we're Interested in is:
@astrojuanlu did you have any thoughts on what else?
paging @bielstela since you opened https://github.com/kedro-org/kedro-viz/issues/1714 :)
before proceeding with get_info
though (which was a strawman proposal), given that there's a AbstractDataset._describe
, should we actually have a describe
for this purpose, and reuse what's already being returned? Looking at a random dataset, we have a bunch of properties there already
@astrojuanlu This makes sense to me. I can reuse this and add any extra metadata that would be required. Is there anything outside of the list that @SajidAlamQB has given above?
Would love to know @merelcht and @rashidakanchwala opinion before proceeding
(Specifically, whether we should add a dedicated public get_info
or if it should be describe
and wrap _describe
)
I missed this but I love this! It would really help Kedro-viz; though we need to decide what it means for each dataset. For instance, for SQLDatasets, could it get_info provide SQL Query etc.
We discussed this in backlog grooming and we acknowledge that the original problem (that Viz needs private data from datasets https://github.com/kedro-org/kedro-viz/issues/1893#issuecomment-2432719034) exists and we want to address it.
There was past discussion about _describe
when we implemented preview
and it was discarded https://github.com/kedro-org/kedro-plugins/pull/504#issuecomment-1924190370
We didn't feel like adding more Viz-specific public methods, although an exception was made for said preview
method since it wasn't part of AbstractDataset
https://github.com/kedro-org/kedro-plugins/pull/504#issuecomment-1929357128
This problem was already anticipated when implementing the preview functionality https://github.com/kedro-org/kedro-viz/issues/662#issuecomment-1646003634 and re-discovered by a user some months later https://github.com/kedro-org/kedro-viz/issues/1714
Renaming this issue so it's more clear that this is an open problem.
Description
Related to: https://github.com/kedro-org/kedro-viz/issues/1893#issuecomment-2456833233
When trying to retrieve metadata (like file size) from datasets in Kedro Viz we are running into issues, specifically within our
DatasetStatsHook
. Currently, to obtain this metadata, we need to access private attributes (e.g.,_filepath
,_fs
,_protocol
), which is causing inconsistency and compatibility problems with datasets that do not expose these attributes.Context
By having a standardised, public way to retrieve dataset metadata we can improve best practices. This benefits users who want to collect dataset metadata or build plugins, as it provides a consistent and maintainable approach.
Possible Implementation
Perhaps adding an optional, public method (e.g.,
get_info
) to datasets that can provide metadata like file size, etc.