guardian / uk-coronavirus-data-alerts

Placeholder description: @mbarton created this with repo-genesis
0 stars 0 forks source link

Add threshold for new cases per 100,000 population #4

Closed amyhughes closed 3 years ago

amyhughes commented 3 years ago

What does this change?

Feedback from Pamela and Niamh was that we should only alert for newCasesBySpecimenDate if the number of cases per 100,000 is over 50. This change applies this threshold by comparing the number of cases to the population figures downloaded from the ONS. Note that we may need to update the url for population figures when new estimates are released in the autumn, I haven't done anything to automate fetching the latest figures since there seem to be slight differences between each release, so we will need to manually test the fetch if and when we switch anyway.

How to test

Running locally we get reasonable looking results:

Some metrics have exceeded 100.0% change week on week:

        <p><table border="1" class="dataframe">
areaName newAdmissions-05-23-2021-to-05-29-2021 newAdmissions-05-30-2021-to-06-05-2021 percentageChange 2 South West 1.571429 3.857143 145.454545

areaName newCasesBySpecimenDate-05-25-2021-to-05-31-2021 newCasesBySpecimenDate-06-01-2021-to-06-07-2021 lastSevenDaysPer100000 percentageChange
326 Angus 41 190 163.511188 363.414634
299 Cheltenham 15 67 57.606658 346.666667
268 North East Lincolnshire 19 80 50.136937 321.052632
164 Northumberland 48 190 58.926788 295.833333
369 Clackmannanshire 28 99 192.083818 253.571429
20 Selby 18 60 66.210550 233.333333
358 Warrington 53 175 83.327778 230.188679
108 Oadby and Wigston 10 31 54.371657 210.000000
364 Tewkesbury 17 51 53.673476 200.000000
148 Blackpool 66 188 134.819213 184.848485
85 Camden 55 150 55.549589 172.727273
127 Sefton 58 157 56.799682 170.689655
370 Perth and Kinross 60 156 102.665350 160.000000
190 East Lothian 32 82 76.571108 156.250000
26 West Dunbartonshire 29 74 83.211515 155.172414
156 Elmbridge 33 84 61.405753 154.545455
157 Liverpool 108 272 54.613868 151.851852
365 Wyre 31 74 66.017789 138.709677
83 Staffordshire Moorlands 42 100 101.589882 138.095238
224 Cheshire West and Chester 148 347 101.145244 134.459459
11 Wandsworth 110 257 77.955089 133.636364
288 Woking 29 67 66.472870 131.034483
265 Hackney and City of London 70 161 57.270916 130.000000
114 Tower Hamlets 93 213 65.589924 129.032258
335 Mid Sussex 34 76 50.323794 123.529412
302 South Ribble 148 325 293.353071 119.594595
98 Pendle 83 180 195.414278 116.867470
86 Hammersmith and Fulham 55 119 64.274642 116.363636
282 Dundee City 156 326 218.323065 108.974359
202 Spelthorne 35 73 73.114058 108.571429
88 West Lothian 71 148 80.830147 108.450704
158 Southwark 109 224 70.256877 105.504587
71 Islington 65 133 54.852825 104.615385
233 South Tyneside 41 83 54.975625 102.439024
241 Gloucester 42 85 65.826157 102.380952

Check https://coronavirus.data.gov.uk/

How can we measure success?

We should keep checking that the alerts are being used, I suspect interest in these figures will dwindle with time.

Have we considered potential risks?

Currently we log an error if we can't find the population figures in the ONS data for the reported newCasesBySpecimenDate for a given area, but we don't do anything to expose this to Pamela and Niamh. Running this locally we find a match for every area code, but it might be nicer if we returned a result with a warning about missing data in the email we send if this were to happen.

joelochlann commented 3 years ago

We should keep checking that the alerts are being used, I suspect interest in these figures will dwindle with time.

Yeah it's a shame we didn't collaborate on this a year ago... probably could've freed up the data team a lot! Lesson for the future...

joelochlann commented 3 years ago

Basic question since I haven't looked at this repo before. Is the output of it just an email that says "check the website" when there's something noteworthy? it doesn't send a spreadsheet or store one somewhere?

amyhughes commented 3 years ago

Thanks Joe, yes, at the moment it's just an email to prompt the data team to check the site if the figures look concerning (interesting).

I'll take a look at better error handling, and making the table construction more readable.