kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.53k stars 879 forks source link

Prioritise user created catch-all dataset factory pattern over `{default}` from runners #3720

Closed ankatiyar closed 2 months ago

ankatiyar commented 4 months ago

Description

From slack conversation with @noklam https://linen-slack.kedro.org/t/16713675/i-am-trying-to-create-a-catch-all-patterns-using-the-dataset#65f8bddf-cb66-4b35-8f5d-c0850669b907

The catch-all factory pattern added by the user in their catalog does not always override the default dataset set by the runner - This works as expected ✅ -

"{catch_all}":
  type: pickle.PickleDataset
  filepath: data/06_models/{catch_all}.pickle

However, this pattern doesn't ❌ -

"{nok}":
  type: pickle.PickleDataset
  filepath: data/06_models/{nok}.pickle

Context

This behaviour was introduced in https://github.com/kedro-org/kedro/pull/3332 and was intentional and is documented, however, it is a little bit strange since for a user defined "catch-all" pattern to be used instead of the default dataset defined by the runner, which is {default} has to be ranked higher in terms of priority (alphabetically) to be picked up. That limits the names of the catch all patterns (should begin with a/b/c etc).

Expected Result