kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.88k stars 894 forks source link

Tutorial data: Sanitise the spaceflights data to remove real countries and replace with fictional locations #2008

Closed stichbury closed 2 months ago

stichbury commented 1 year ago

It would make sense to change the spaceflights data slightly to keep the example completely fictional. Let's revise the locations and any other information that looks to be "genuine" since it isn't (it's completely fabricated) and shouldn't appear to be realistic in any way.

stichbury commented 1 year ago

We could also reduce the size of the data perhaps. As @noklam mentions

As a starter that it get used in demo and testing, it takes a considerate of time to run the pipeline. For example in the Kedro bootcamp I see demoing catalog.load("shuttles") takes like 15-20 seconds and is a bit awkward for demo purpose.

merelcht commented 12 months ago

We could also reduce the size of the data perhaps. As @noklam mentions

As a starter that it get used in demo and testing, it takes a considerate of time to run the pipeline. For example in the Kedro bootcamp I see demoing catalog.load("shuttles") takes like 15-20 seconds and is a bit awkward for demo purpose.

I've created a separate ticket for this: https://github.com/kedro-org/kedro/issues/3109

ggermade commented 12 months ago

Working on this :) !

Gundalai-Batkhuu commented 11 months ago

Hi. Can I get assigned for this please? I'm a CS student and this is my first open-source contribution.

stichbury commented 11 months ago

Welcome @Gundalai-Batkhuu! You'd be very welcome, but before you dig in, let's see how @ggermade is getting on as they have posted above that they're working on this.

ggermade commented 11 months ago

hello! I'm actually with this already, I have a branch with changes but have not opened a PR yet as I'm missing a crucial step: finding out if there's missing documentation referencing the country name changes, so I can address them too

Gundalai-Batkhuu commented 11 months ago

No worries. I'll have a look at another issue.

stichbury commented 11 months ago

Hi @ggermade just wanted to check how this is going for you? Anything we can help with?

stichbury commented 11 months ago

Hi @ggermade This is the final call for contributions for October 2023. Please let us have any PRs you have in flight for Hacktoberfest before the end of the day!

astrojuanlu commented 2 months ago

This is an old issue and, apart from some interest during Hacktoberfest, not much has happened. I'm closing for now.