numfocus / YouTubeVideoTimestamps

Adding timestamps to NumFOCUS and PyData YouTube videos!
https://www.youtube.com/c/PyDataTV
MIT License
77 stars 19 forks source link

Matt Harrison - Testing Pandas- Shoots, leaves, and garbage! | PyData Global 2022 #165

Open emmcauley opened 1 year ago

emmcauley commented 1 year ago

Video URL: https://www.youtube.com/watch?v=Kj1WwpPFr-I

Contents

0:37 Speaker Introduction 4:08 Example scenario and average Pandas-based processing techniques 6:38 Chaining 7:58 Creating a function to simplify your chain 8:58 Debugging your chain with .pipe and custom functions like size() and store() 11:27 Debugging with Jupyter: set_trace() and the iPython debugger 13:30 Debugging errors after the fact with %debug 14:33 Using ‘??’ to get source code for a method or function 14:58 Testing your code with pytest and sample input data 19:01 ipytest supports native use of pytest within Jupyter notebook 19:47 Using pandera and hypothesis libraries to test assertions about data and code 22:15 Using great_expectations library to make assertions about data and schema 26:20 Thanks and resources 26:26 Q&A — would you make an expectation for each function? 27:29 Q&A — how do you choose between pandera and great_expectations? 27:48 Q&A — can we access the notebook? 28:00 Q&A — can hypothesis save schemas so that the input data is not necessary to run tests? 28:24 Q&A — what is the appropriate amount of testing? 28:47 Q&A — how good is pandera at generating data for hypothesis? 29:19 Q&A — do you have recommendations for testing Jupyter notebooks?