get map Soho and approximate location of the cholera cases he registered
generate a tree
the data for that comes from sequence data unrelated to this outbreak, they are just there for illustrative purposes
map the cases and query by location (we suspect a pump, how many cases in the perimeter of it)
each pump has a count of cases: construct a Bayesian model (cholera_count ~ poisson(...)) that identifies the right pump (that is the modelling/ machine learning part)
the data of the outbreak is "discovered" by streaming raw read sets (metagenomic water/ stool samples) and checking for cholera, they exist only as unannotated data dumps w/ minimal annotation and geotag (we anticipate here the rise of the minisequencer + app in smart phone, assumption: people will be reluctant to place data in any special format (such as ENA ...)