pydatabangalore / talks

Talks at PyData Bangalore meetups
MIT License
36 stars 11 forks source link

ML Models and Dataset versioning #20

Closed kurianbenoy closed 4 years ago

kurianbenoy commented 4 years ago

Title

ML Models and Dataset versioninig

Description

In this talk we will discuss the current best practices of organizing ML projects and why traditional open-source tools like Git, And I will be discussing about one of the best practises ie ML models and Dataset versioning

Duration

Audience

Intermediate

Outline

In this talk we will discuss about the current best practices of organizing ML projects and why traditional open-source tools like Git and Git-LFS won't help us here.

Currently the life-cycle of any Machine learning model goes through following process:

Git can’t handle large amount of data of GB’s of size. While Git-LFS comes with the in-build difficulty of supporting only 2 GBs of data at the maximum(Github limitations) and even more problems exist.

Data Version Control or DVC.ORG is an open-source, command-line tool written in Python. We will show how to version datasets with dozens of gigabytes of data and version ML models, how to use your favourite cloud storage (S3, GCS, or bare metal SSH server) as a data file backend and how to embrace the best engineering practices in your ML projects. Also, I will be discussing tools in the market for both experiment tracking and dataset versioning, and what are the best features of these products(PS: no comparison among one another).

Talk Outline

Slides

Additional notes

Kurian Benoy is an open-source contributor at CloudCV, DVC. He is the lead organiser of School of AI, Kochi and is an AI enthusiast working on Deep Learning and Computer Vision. Kurian is FOSSASIA Open TechNights WInner and gave a talk in FOSSASIA Open Tech submit about the keralarescue.in team.

I am an active kaggler and was the first person to introduce about Data Version Control in Kaggle and is among the top 10 contributors of dvc, so far.


vinayak-mehta commented 4 years ago

@kurianbenoy Thanks for the proposal! Are you available to give this talk at next Saturday's meetup (Oct 19)?

kurianbenoy commented 4 years ago

@vinayak-mehta, I realised I won't be able to come for meetup cause: 1) I am new to Bangalore, and I am there to attend InOut Hackathon which starts at 9AM morning. I thought multi-tasking both the things together won't be a good idea. 2) I am not having my personal laptop, so I am doubtful about how much part of demo I can show.

I hope I can come to PyData Bangalore community one day :)