kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
681 stars 113 forks source link

Preview SQL code in Kedro-Viz for SQL Datasets #1521

Open delvillarjavier opened 1 year ago

delvillarjavier commented 1 year ago

Description

Currently Kedro-Viz only previews data from CSVs and Excel files, I would like to view the SQL code that produces that table, not the resulting table.

Context

I use Kedro-Viz very often to review projects status, it would be nice if I can see all the datasources and the SQL code that produces them in one place.

Possible Implementation

Add a toggle in SQL based datasets image

yetudada commented 1 year ago

This makes me wonder, do you want to also see the code for other datasets listed in catalog.yml too?

delvillarjavier commented 1 year ago

Hello! I was thinking about that, I just want some idea of the data I would get or where is it, in the case there is no "File Path" attribute. :)

MehdiNV commented 1 year ago

Update: Leaving an update here on progress, and potential next steps! As useful context for anyone re-visiting.

We decided to split this issue into two parts:

  1. Implement a 'Preview' for SQLQueryDataset-based data node
  2. Extend the Preview mechanism for other 7 remaining datasets (all who don't have a file, and hence would benefit)

Reason for this is that we'd need some more time on finalising the design for those other 7 datasets - the SQL-dataset is still technically 'Code', and hence would fall in-line with a 'Show Code' / Code preview window. The other 7 in contrast may only have a URL (e.g. API) or source database, and our plan is to re-think where exactly they should go in the metadata window instead of adding it now and needing to refactor it later.

Current PR implements the 1st task fully in the meantime! https://github.com/kedro-org/kedro-viz/pull/1564

astrojuanlu commented 1 year ago

One question (maybe a bit too late, apologies): wouldn't it be easier to display the whole YAML definition of the dataset? So, instead of just the SQL query, the whole thing:

shuttle_id_dataset:
  type: pandas.SQLQueryDataset
  sql: "select shuttle, shuttle_id from spaceflights.shuttles;"
  credentials: db_credentials

that way (1) this works for every dataset, not only SQL ones, and (2) possibly addresses the coupling @merelcht flagged in https://github.com/kedro-org/kedro-viz/pull/1564#issuecomment-1771260062

MehdiNV commented 1 year ago

Update, had a discussion on feature:

Former approach of picking specific metadata's to display, like SQL, increase our dependency on kedro-framework; if the dataset name or any of its attributes changes, then our code would break as well.

There's two workarounds to this: 1) Display everything; we're already recieving a object that contains all of the metadata attributes - we'd barely increase coupling / dependency if we just display everything instead of picking and choosing subsets of it (changes to attribute names or dataset type wouldn't matter - we just care about displaying what we have present)

2) Add a _describe method on the kedro-datasets end, which could be called by kedro-viz. This method would handle the metadata value display, and would be the most optimal solution (e.g. framework changes would cause that method to fail, letting them know to change it; if the logic lived in kedro-viz, framework could push breaking changes because they'd have little visibility on how it'd affect viz / how viz uses that knowledge).

Ultimate decision was to implement decision 2, since its the most optimal and would have to be done anyway in the future.

Current status of issue is pending / TODO, as we need confirmation when this implementation can be done / if it fits with the next kedro-framework sprint/s. Its PR will be closed for the time being.