kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.46k stars 875 forks source link

Conduct market research on versioning #3933

Open astrojuanlu opened 3 weeks ago

astrojuanlu commented 3 weeks ago

In https://github.com/kedro-org/kedro/milestone/63 there are several linked issues related to Kedro's Dataset Versioning.

Before we start working on it, we'd want to do a bit of market research on other tools and formats that support versioning. At a minimum, it should include

The objectives are

The end goal is to inform decision making around Kedro Dataset Versioning.

noklam commented 2 weeks ago

We should also review https://github.com/kedro-org/kedro/pull/1871

astrojuanlu commented 2 weeks ago

Notice that the goal of this is not to assess current Kedro versioning capabilities, but rather to provide an outward looking perspective at what other systems are doing. That ideally should inform next steps in https://github.com/kedro-org/kedro/milestone/63