0:04 Introduction of Miki Tebeke
1:30 Definition of performance in the talk
2:09 Optimised code saves cloud costs and research time
4:00 Optimisation increases development time
4:50 Latency explained in a human timescale
8:25 Rules of the "Optimisation Club"
11:40 %time and %timeit Jupyter magics
14:07 %prun magic and Python profilers
17:10 Loops vs vectorisation
18:30 Avoid using .iterrrows()
19:50 Python functions vs Pandas methods vs NumPy methods
21:05 NaNs in NumPy and Pandas
23:36 Limit memory usage with Pandas
27:48 .apply() in Pandas
29:04 Use categorical type instead of strings
30:40 Optimisation is more about the culture than the process
32:32 [Question 1] Alternatives to Pandas
33:55 [Question 2] Time profiling in PySpark
34:44 [Question 3] Does .apply() with axis create extra rows?
36:16 [Question 4] Any tips about indexing?
Timestamps for: Faster Pandas: Make your code run faster and consume less memory| Miki Tebeke, CEO 353solutions.
0:04 Introduction of Miki Tebeke 1:30 Definition of performance in the talk 2:09 Optimised code saves cloud costs and research time 4:00 Optimisation increases development time 4:50 Latency explained in a human timescale 8:25 Rules of the "Optimisation Club" 11:40
%time
and%timeit
Jupyter magics 14:07%prun
magic and Python profilers 17:10 Loops vs vectorisation 18:30 Avoid using.iterrrows()
19:50 Python functions vs Pandas methods vs NumPy methods 21:05 NaNs in NumPy and Pandas 23:36 Limit memory usage with Pandas 27:48.apply()
in Pandas 29:04 Use categorical type instead of strings 30:40 Optimisation is more about the culture than the process 32:32 [Question 1] Alternatives to Pandas 33:55 [Question 2] Time profiling in PySpark 34:44 [Question 3] Does.apply()
with axis create extra rows? 36:16 [Question 4] Any tips about indexing?