numfocus / YouTubeVideoTimestamps

Adding timestamps to NumFOCUS and PyData YouTube videos!
https://www.youtube.com/c/PyDataTV
MIT License
77 stars 19 forks source link

Juan Luis- Expressive and fast dataframes in Python with polars #148

Open rockper opened 1 year ago

rockper commented 1 year ago

Contents 0:23 Section 1: Outline of the talk 0:53 Speaker bio (Juan Luis) 2:09 Section 2: pandas 3:27 pandas: successes and limitations 5:44 Section 3: Alternatives to pandas 6:05 Dataframes Charming Quadrangle of dataframe tools 8:45 Apache Arrow 10:03 Polars 11:33 Resources on PyArrow, Vaex, DuckDB, Fugue, Ibis 12:48 Section 4: Demo - Intro to Polars 17:44 Demo - The Polars expression system 17:50 Create a generic expression (cell 15). Then apply it to any dataframe 18:58 Lists of expressions will be computed in parallel 20:00 Q: Columns with mixed types - pandas (object type) Vs polars (Arrow types) 20:57 Q: Window Functions, Polars User Guide, Reference Guide 22:04 pl.col().arr - A dedicated namespace for arrays and lists of objects 23:10 Demo - Using Lazy evaluation with df.lazy() 25:41 pl.scan_csv() - Read a CSV as a lazy dataframe 26:49 plan.show_graph() - View the query execution for a chain of operations 27:27 Working with columns of lists 31:42 Section 5: Conclusions and Q&A - Q1: Lazy reading; Streaming support 32:47 Q2: Dask on Polars; Out-of-core computation 34:07 Q3: Sampling 35:19 Q4: Lazy evaluation for large datasets 36:05 Q5: Query optimization 37:10 Q6: SQL on databases with connectorx or psycopg2; SQL on files with DuckDB 37:50 Recap: Polars does not have an index on dataframes, while pandas does

Original video: https://www.youtube.com/watch?v=LGAHTp4DYZY Title: Juan Luis- Expressive and fast dataframes in Python with polars | PyData NYC 2022