handling missing data (imputation), dropping these rows, 'fixing them', changing the distribution without realising it (can I visualise this as a demo?)
class imbalance - massive imbalances (e.g. click data) mean that accuracy isn't so useful
lack of visualisation and pair-plots, so there's no gut feel for the underlying data