ML topics that can confuse new people

handling missing data (imputation), dropping these rows, 'fixing them', changing the distribution without realising it (can I visualise this as a demo?)
class imbalance - massive imbalances (e.g. click data) mean that accuracy isn't so useful
lack of visualisation and pair-plots, so there's no gut feel for the underlying data
how to layout files: https://drivendata.github.io/cookiecutter-data-science/

ianozsvald / data_science_delivered