Open NickleDave opened 8 years ago
@bollwyvl @tonyfast I realize I'm late to the party ... let me know if you think this is something that could fit in to what you've got scheduled. Thanks
Alright, we've got you in the science workshop... we'll be putting some more structure around it (repos, chat, website, etc). soon! Thanks!
Me
David Nicholson
Abstract
Scientists that study machine learning often plot the error of a model against the amount of data used to train that model. Such plots are known as learning curves or validation curves. In 1994, Cortes et al. proposed a method for fitting these curves with an exponential decay function. Their method provides a way to predict how different models stack up against each other. Importantly, it can avoid the computationally expensive process of estimating error for large training sets. With help from a Jupyter notebook, I will introduce exponential decay functions and give a brief derivation of Cortes et al.'s method. Then I will demonstrate how to fit learning curves with their model, using the data sets built into the Sci-Kit Learn library. I will also demonstrate some less-than-ideal fits using my own (lovely) data. Lastly I will discuss how it might be possible to detect statistically significant differences between models using the fit parameters. (Step 3: profit). I expect the talk to be about 20 minutes.