5 Reasons Parquet Files Are Better Than CSV for Data Analyses

numfocus / YouTubeVideoTimestamps

Adding timestamps to NumFOCUS and PyData YouTube videos!

https://www.youtube.com/c/PyDataTV

MIT License

79 stars 19 forks source link

5 Reasons Parquet Files Are Better Than CSV for Data Analyses | PyData Global 2021 #201

Open Adrianf23 opened 8 months ago

Adrianf23 commented 8 months ago

1:10 Intro 2:10 What you will get from this talk 3:03 Why file formats matter 3:56 5 reason Parquet is better than CSV 5:44 Cluster computation at a high level 6:12 #1 Column pruning 9:34 #2 Predicate pushdown filtering 14:10 #3 File compression 17:34 #4 Schema metadata 21:10 Pro-tip: use pyarrow to get the metadata 22:00 #5 Parquet files are immutable 23:25 Improving on parquet files 24:56 Q&A