kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
646 stars 106 forks source link

Update documentation and schema validation for Custom Dataset Preview #1884

Open ravi-kumar-pilla opened 2 months ago

ravi-kumar-pilla commented 2 months ago

Description

We have introduced custom dataset preview for our users allowing them to implement preview method. However, the documentation needs to be updated with some details regarding the expected dict schema and also add some validation for the NewTypes (eg., TablePreview).

Context

https://github.com/kedro-org/kedro-viz/issues/1847#issuecomment-2086213105

Possible Implementation

  1. The return type of the preview function should match one of the following types
TablePreview = NewType("TablePreview", dict)
ImagePreview = NewType("ImagePreview", bytes)
PlotlyPreview = NewType("PlotlyPreview", dict)
JSONPreview = NewType("JSONPreview", dict) 
  1. Kedro-Viz expects the dict should contain the below schema -

    TablePreview :

    preview={
        'index': number[], 
        'columns': string[], 
        'data': any[][]  // List[List[Any]] 
    }

    index - An array of 0 indexed integers representing nrows columns - An array of strings representing names of ncolumns data - A 2D array representing data for the TablePreview

    Example -

    Catalog -

    companies:
      type: pandas.CSVDataset
      filepath: ${_base_location}/01_raw/companies.csv
      metadata:
        kedro-viz:
          layer: raw
          preview_args:
            nrows: 5

    TablePreview value returned from preview() function -

    preview={'index': [0, 1, 2, 3, 4], 'columns': ['id', 'company_rating', 'company_location', 'total_fleet_count', 'iata_approved'], 'data': [[35029, '100%', 'Niue', 4.0, 'f'], [30292, '67%', 'Anguilla', 6.0, 'f'], [19032, '67%', 'Russian Federation', 4.0, 'f'], [8238, '91%', 'Barbados', 15.0, 't'], [30342, nan, 'Sao Tome and Principe', 2.0, 't']]}
  2. We should enforce the schema in the NewType that is introduced to avoid blank UI as mentioned here
  3. We should document the expected {key:value} pairs.

NOTE: This ticket needs to be updated with schema details of other return types (ImagePreview, PlotlyPreview, JSONPreview) for reference

Possible Alternatives

Checklist