kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.49k stars 874 forks source link

Added 'custom_args' attribute to AbstractDataset class #3761

Closed noamgoldberg closed 2 months ago

noamgoldberg commented 3 months ago

Description

The Inspiration

Personally, this is my most frequent (and favorite) application of "dataset-specific" args. Unfortunately, I find myself creating a custom class for each type of artifact (i.e. CSV, plotly, pickle, etc.), and do so again each time I create a new kedro project.

Broader Usage

The above is a very specific use of the proposed 'custom_args' feature, though I believe many developers would find it useful to have access to custom args without having to rewrite numerous custom classes. I know it was a popular feature among my former team members (for the dynamic saving method detailed above)!

Development notes

Given the minor extent of the change, I don't believe this merits an independent test. If I were to test it, however, I would test the instantiation of an AbstractDataset, a child of AbstractDataset (i.e. CSVDataSet), and ensure I could properly access the configured custom_args.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

astrojuanlu commented 3 months ago

Hi @noamgoldberg , thanks for your PR!

Before proceeding, could you have a look at the metadata key and see if it would suit your needs? It's not part of the AbstractDataset, but all derived datasets have it.

astrojuanlu commented 2 months ago

Hi @noamgoldberg , I echo what I said about metadata in https://github.com/kedro-org/kedro/pull/3737#issuecomment-2095343920

About custom arguments, the preferred route would be to either use metadata or define your own dataset.

I appreciate your pull request but I am closing it for now 🙏🏼 If you have further ideas on how to improve Kedro, please open a new Discussion in the "Discussions" tab and let's take it from there.

Thanks again!