gantir commented 4 years ago

Title

Apache beam with python, getting started

Description

Apache beam introduces a new paradigm to data processing. No longer do you need to suffer the duality of lambda architecture worrying about maintaining two variations of your code, one for batch and another for streaming. This talk introduces to you the concepts behind beam, how it de coplues data processing from underlying infrastructure.

We will then walk through a small example in python

Duration

[ ] 30 min
[x] 45 min

Audience

Experience with data processing helps. This is a 101 level

Outline

In the talk, we start off by providing an overview of Apache Beam using the Python SDK and the problems it tries to address from an end user’s perspective. We cover the core programming constructs in the Beam model such as PCollections, ParDo, GroupByKey, windowing and triggers. We describe how these constructs make it possible for pipelines to be executed in a unified fashion in both batch and streaming. Then we use examples to demonstrate these capabilities. The examples showcase using Beam for stream processing and real time data analysis, and how Beam can be used for feature engineering in some Machine Learning applications using Tensorflow. Finally, we end with Beam's vision of creating runner and execution independent graphs using the Beam FnApi [2].

Link to Presentation.

Additional notes

About me: I have built and scaled technology products at multiple startups. Currently building a marketing analytics solution.

[ ] Don't record this talk.

Check this if you don't want your talk to be recorded.

vinayak-mehta commented 4 years ago

@gantir Can you also post slides or a link to your demo? See you on 21st!

vinayak-mehta commented 4 years ago

Thanks for the talk!

pydatabangalore / talks

Apache beam with python, getting started #15

Title

Description

Duration

Audience

Outline

Additional notes