Closed kurianbenoy closed 4 years ago
@kurianbenoy Thanks for the proposal! Are you available to give this talk at next Saturday's meetup (Oct 19)?
@vinayak-mehta, I realised I won't be able to come for meetup cause: 1) I am new to Bangalore, and I am there to attend InOut Hackathon which starts at 9AM morning. I thought multi-tasking both the things together won't be a good idea. 2) I am not having my personal laptop, so I am doubtful about how much part of demo I can show.
I hope I can come to PyData Bangalore community one day :)
Title
Description
Duration
Audience
Outline
Currently the life-cycle of any Machine learning model goes through following process:
Git can’t handle large amount of data of GB’s of size. While Git-LFS comes with the in-build difficulty of supporting only 2 GBs of data at the maximum(Github limitations) and even more problems exist.
Data Version Control or DVC.ORG is an open-source, command-line tool written in Python. We will show how to version datasets with dozens of gigabytes of data and version ML models, how to use your favourite cloud storage (S3, GCS, or bare metal SSH server) as a data file backend and how to embrace the best engineering practices in your ML projects. Also, I will be discussing tools in the market for both experiment tracking and dataset versioning, and what are the best features of these products(PS: no comparison among one another).
Talk Outline
Slides
Additional notes
I am an active kaggler and was the first person to introduce about Data Version Control in Kaggle and is among the top 10 contributors of dvc, so far.