kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.88k stars 894 forks source link

Make A Kedro Project Day #2356

Closed ankatiyar closed 1 year ago

ankatiyar commented 1 year ago

Introduction

Tentative date - 9th March, 2023

The purpose of this day is for the team to get together and create our own Kedro projects to get familiar with the users experiences and pain points.

The plan

The idea is for us as a team to try out all the different components of Kedro right from the instantiation of a project to deployment and make note of any bugs or pain points we face along the way and also document the manual testing steps.

I and @AhdraMeraliQB have made a list of some components to test out with our projects. The list is not exhaustive, please feel free to add more to it -

Instantiating a project

Project development with the interactive workflow

Scaling project development

Running

Setting up DevOps

Deployment

Extras for Advanced Users

Resources

Check out Kaggle Competitions for project ideas. Here's some examples -

Some places to search for datasets for your projects -

Bonus : Interesting datasets I found -

The goal for each project would be to touch on at least one item from each area. At the end of the day there will be time for participants to talk about their projects and their experience creating them.

datajoely commented 1 year ago

❤️ this

stichbury commented 1 year ago

I love this, thank you for the initiative. I know @astrojuanlu and I will be all over this from a devrel perspective. It would be great, for example, to collect all the projects together and I can write some narrative about the day for a blog post and point to them all. Then later we can pull out different aspects into tutorials and posts. Nice one!

Would the kedro-community repo be a good place for us to work on the day, so we have a set of the projects kept together as we build them?

stichbury commented 1 year ago

Would databricks be something to include too?

merelcht commented 1 year ago

This looks great! I'm very excited about this day already 😄 Do you have any thoughts on the projects we could do? E.g. any kaggle competitions or specific datasets that would be good to work with? Having a pre-selected list might help getting started quickly on the day.

stichbury commented 1 year ago

I was wondering if we could use a databricks tutorial as a starting point, and basically rework it to use Kedro on top of Databricks? I found this tutorial example with the million song database in their docs and perhaps we could adapt it, but I've no idea if I'm on the right track here. Maybe something @jmholzer has already seeen/tried?

astrojuanlu commented 1 year ago

Other datasets and interesting sites:

stichbury commented 1 year ago

An easy one to update (which isn't really creating a new project, more bringing an old one up to date) would be Lais' project on penguins:

https://github.com/laisbsc/kedro_penguins

Image