Contents
0:23 Section 1: Outline of the talk
0:53 Speaker bio (Juan Luis)
2:09 Section 2: pandas
3:27 pandas: successes and limitations
5:44 Section 3: Alternatives to pandas
6:05 Dataframes Charming Quadrangle of dataframe tools
8:45 Apache Arrow
10:03 Polars
11:33 Resources on PyArrow, Vaex, DuckDB, Fugue, Ibis
12:48 Section 4: Demo - Intro to Polars
17:44 Demo - The Polars expression system
17:50 Create a generic expression (cell 15). Then apply it to any dataframe
18:58 Lists of expressions will be computed in parallel
20:00 Q: Columns with mixed types - pandas (object type) Vs polars (Arrow types)
20:57 Q: Window Functions, Polars User Guide, Reference Guide
22:04 pl.col().arr - A dedicated namespace for arrays and lists of objects
23:10 Demo - Using Lazy evaluation with df.lazy()
25:41 pl.scan_csv() - Read a CSV as a lazy dataframe
26:49 plan.show_graph() - View the query execution for a chain of operations
27:27 Working with columns of lists
31:42 Section 5: Conclusions and Q&A - Q1: Lazy reading; Streaming support
32:47 Q2: Dask on Polars; Out-of-core computation
34:07 Q3: Sampling
35:19 Q4: Lazy evaluation for large datasets
36:05 Q5: Query optimization
37:10 Q6: SQL on databases with connectorx or psycopg2; SQL on files with DuckDB
37:50 Recap: Polars does not have an index on dataframes, while pandas does
Contents 0:23 Section 1: Outline of the talk 0:53 Speaker bio (Juan Luis) 2:09 Section 2: pandas 3:27 pandas: successes and limitations 5:44 Section 3: Alternatives to pandas 6:05 Dataframes Charming Quadrangle of dataframe tools 8:45 Apache Arrow 10:03 Polars 11:33 Resources on PyArrow, Vaex, DuckDB, Fugue, Ibis 12:48 Section 4: Demo - Intro to Polars 17:44 Demo - The Polars expression system 17:50 Create a generic expression (cell 15). Then apply it to any dataframe 18:58 Lists of expressions will be computed in parallel 20:00 Q: Columns with mixed types - pandas (object type) Vs polars (Arrow types) 20:57 Q: Window Functions, Polars User Guide, Reference Guide 22:04 pl.col().arr - A dedicated namespace for arrays and lists of objects 23:10 Demo - Using Lazy evaluation with df.lazy() 25:41 pl.scan_csv() - Read a CSV as a lazy dataframe 26:49 plan.show_graph() - View the query execution for a chain of operations 27:27 Working with columns of lists 31:42 Section 5: Conclusions and Q&A - Q1: Lazy reading; Streaming support 32:47 Q2: Dask on Polars; Out-of-core computation 34:07 Q3: Sampling 35:19 Q4: Lazy evaluation for large datasets 36:05 Q5: Query optimization 37:10 Q6: SQL on databases with connectorx or psycopg2; SQL on files with DuckDB 37:50 Recap: Polars does not have an index on dataframes, while pandas does
Original video: https://www.youtube.com/watch?v=LGAHTp4DYZY Title: Juan Luis- Expressive and fast dataframes in Python with polars | PyData NYC 2022