A PD2 report template that incorporates all formatting necessary for students in the course, so that they can focus on the actual content instead of worrying about less important details.
it has a great community with tonnes of support and packages that solve a lot problems right off the bat
notebooks allow us to produce informative reports of houw analysis and though process in a reproducible way.
scikit provides a lot of tools for free:
including helpful abstractions like pipelines, feature unions, and a tonne of ready to go algorithms
python as a language is extremely versatile. and provides an end to end framework forn a machinea learning system.
however, there are limitations. python for example is bound by memory and cores.
Because of the python implementation making things extremely parallel are hard
on top of that, the python machine learning stack requires all the training data to be in memory
there are various ways of getting around this.
one is vowpal wabbit, online learning makes it super efficient on a single machine, however we lose a lot of the luxury that comes with using python
another is spark, with apis in python and scala, it's begining to develop dataframe-like, and pipeline-like abstractions over data for machine learning projects.
however its not as mature and lacks a lot of functionaliy as you would expect
this is why python is idea for prototypes.
explore a space and build models and evaluate performance on a small set of data
once the data is too big, then potentially recycle the code and write it to vowpal wabbit
Conclusion