kedro-org / kedro-plugins

First-party plugins maintained by the Kedro team.
Apache License 2.0
84 stars 76 forks source link

Query endpoint for `SnowparkTableDataset` #721

Open ElenaKhaustova opened 3 weeks ago

ElenaKhaustova commented 3 weeks ago

Description

SnowparkTableDataset dataset configuration does not have a query endpoint, so running database-level SQL queries is not possible at the catalog level. Thus users have to make it at the level of the database - at first, execute query to filter data and only after run a Kedro pipeline. Users expect it to work similar to SQLQueryDataset and GBQQueryDataset where they have a query endpoint.

https://docs.kedro.org/projects/kedro-datasets/en/kedro-datasets-3.0.1/api/kedro_datasets.snowflake.SnowparkTableDataset.html

We propose to:

  1. Explore the feasibility of adding a query endpoint in dataset configuration.
  2. Enhance documentation with tutorials and working examples of how to run SQL queries with Ibis in such cases instead: https://kedro.org/blog/sql-data-processing-in-kedro-ml-pipelines.

Context

Screenshot 2024-06-06 at 15 00 21

merelcht commented 3 weeks ago

This seems very specific to the SnowparkTableDataset, so I personally wouldn't tackle this as part of the other catalog work. I'll move it to the kedro-plugins repo under the individual dataset improvements milestone.