Scaling Predictive Model Using Spark

razi5287 commented 5 years ago

Abstract (2-3 lines) With real time data getting piled up, you want your predictive model to be scale-able. Python ML techniques provides rich functionality and flexibility, using which you can train your model. However they have high time complexity if same model is used to predict on humongous production data. The talk presents the idea of Divide n Conquer by Broadcasting the Python Object to distributed data chunks and making predictions in parallel.
Brief Description and Contents to be covered 1.Model Training in Python – Taking out representational sample of data and train the model using selective techniques. Get the stable and robust model as output Problems with traditional way of prediction – Show the bottlenecks if the production data is huge 2.Distribute the Production Data – Using Big Data Storage techniques, production data is stored in distributed fashion – A small architecture preview 3.Broadcast the model for Prediction – Rather than pulling data to your server, the model is broadcasted to smaller chunks of production data and work parallel. 4.Benefits – Bridge the gap between parallel computing and traditional modelling 5.Benchmark – present the stats for the gain
Pre-requisites for the talk Basic understanding of python Understanding of Modeling
Time required for the talk ~30 min
Link to slides To be shared
Will you be doing hands-on demo as well? No
Link to ipython notebook (if any) To be shared
About yourself Data Science Engineer
Are you comfortable if the talk is recorded and uploaded to PyData Delhi's YouTube channel ? Yes
Any query ? Will QnA session be made available separately?

MSanKeys963 commented 5 years ago

Hi @razi5287! Thanks for the proposal. We'll give you 5 minutes extra for QnA.

shagunsodhani commented 5 years ago

Hey @razi5287 Thank you for your proposal. It looks very interesting. Could you please share the slide deck here so that we may skim through it once.

MSanKeys963 commented 5 years ago

Hi @razi5287. Updates on slides?

MSanKeys963 commented 5 years ago

Slight reminder @razi5287.

pydatadelhi / talks

Scaling Predictive Model Using Spark #96