kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
10.03k stars 906 forks source link

Can we remove `LambdaDataset`? #4292

Closed merelcht closed 2 weeks ago

merelcht commented 3 weeks ago

Description

I'm wondering if anyone is using the LambdaDataset at all.

Context

While going through the codebase to see what parts need work before Kedro 1.0.0 I came across LambdaDataset. Other than linting/typing changes, it hasn't been touched since 2019.

Galileo-Galilei commented 3 weeks ago

Just my two cents - I never ever see someone use it. I think it well become less useful isf we simplify the creation of a custom dataset (including removing the number of abstract methods like _describe, mutualizing some part of fsspec code in the abstract class...)

astrojuanlu commented 3 weeks ago

We are giving the community a few days to weigh in ahead of deprecating LambdaDataset, pending removal in Kedro 0.20.

To note, this section https://docs.kedro.org/en/stable/nodes_and_pipelines/run_a_pipeline.html#output-to-a-file needs to be deleted or completely rewritten.