1:10 Intro
2:10 What you will get from this talk
3:03 Why file formats matter
3:56 5 reason Parquet is better than CSV
5:44 Cluster computation at a high level
6:12 #1 Column pruning
9:34 #2 Predicate pushdown filtering
14:10 #3 File compression
17:34 #4 Schema metadata
21:10 Pro-tip: use pyarrow to get the metadata
22:00 #5 Parquet files are immutable
23:25 Improving on parquet files
24:56 Q&A
1:10 Intro 2:10 What you will get from this talk 3:03 Why file formats matter 3:56 5 reason Parquet is better than CSV 5:44 Cluster computation at a high level 6:12 #1 Column pruning 9:34 #2 Predicate pushdown filtering 14:10 #3 File compression 17:34 #4 Schema metadata 21:10 Pro-tip: use pyarrow to get the metadata 22:00 #5 Parquet files are immutable 23:25 Improving on parquet files 24:56 Q&A