kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
10.03k stars 906 forks source link

Dataset `load` / `save` method cannot take arguments #4346

Open noklam opened 5 days ago

noklam commented 5 days ago

Description

from kedro.io.core import AbstractDataset

class MyDataset(AbstractDataset):
    def save(self): ...
    def _describe(self): ...
    def load(self, extra_arg): print(extra_arg)

my_dataset = MyDataset()
my_dataset.load(123)
{
    "name": "TypeError",
    "message": "MyDataset.load() takes 1 positional argument but 2 were given",
    "stack": "---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 10
      6     def load(self, extra_arg): print(extra_arg)
      9 my_dataset = MyDataset()
---> 10 my_dataset.load(123)

TypeError: MyDataset.load() takes 1 positional argument but 2 were given"
}

Context

Usually dataset use class attribute, i.e. self.load_args for passing argument to load method. This is fine when using the full Kedro framework with YAML, but it's not convenient when developing interactively. For example I want to do dataset.save(filepath=xyz) to override the settings.

Expected Result

Whether or not the load/save argument should be available for interactive use is a separate topic, if I create the method that takes an argument, it should work. In addition, the stacktrace does not give any useful information which is confusing.

Your Environment

datajoely commented 5 days ago

Yeah this is a no brainer

deepyaman commented 5 days ago

This seems more like a feature request than a bug? I don't think this has ever been supported.

Yeah this is a no brainer

Not sure I understand why. By passing args that are a property of the dataset to the save or load method, you're breaking the abstraction Kedro provides.

datajoely commented 2 days ago

Agree on the labelling, but aren't you extending rather than breaking this?

noklam commented 2 days ago

This seems more like a feature request than a bug? I don't think this has ever been supported.

I haven't tested, but I think this is always supported. The way that framework invoke dataset is usually via something similar:

result = dataset.load()

So framework is not aware of any extra argument load can take, but it does not prevent load method takes extra argument in interactive flow. It is confusing because the error message does not give me any help. I am able to get around with it by actually using _load instead of load.