ropensci / stats19

R package for working with open road traffic casualty data from Great Britain
https://docs.ropensci.org/stats19
GNU General Public License v3.0
61 stars 19 forks source link

CRASH/COPA caveat #91

Open wengraf opened 5 years ago

wengraf commented 5 years ago

Hi:

This is great stuff - but I'm concerned someone unsure of the background of STATS19 data collection might come to incorrect conclusions, specifically around changes in Serious casualties over time, and in recent-year analysis of spatial differences in Serious casualties.

There are new data collection methods, compared to paper, in this data now:

  1. CRASH (a DfT promoted mobile app for police)
  2. COPA (a Met Police mobile system)
  3. Online public submissions

(http://roadsafetyanalysis.org/2017/09/2016-gb-casualty-data-released/)

Not all constabularies will be on CRASH/COPA, but they will be showing rises in Serious casualties relative to previous years, to some degree because these apps force those entering data to enter data more precisely. (Many who ought to don't know what constitutes a "serious"). This can easily be mis-read and encourage false conclusions.

I'd suggest some sort of warning either as the package loads, or for results including 2016+ data in the first instance.

I'd also be happy to hunt down a list of Police Forces and when/if they switched, so that you could add another field ("data_entry_type" or similar). One could then adjust serious totals as appropriate to make analysis across space or time more robust.

Ivo

layik commented 5 years ago

Hello Ivo,

Thank you for opening the ticket. This is important and worth followup. I will break down your post into few points as I understand it:

  1. Potential warning message along with the disclaimer currently in, would be related to data post 2016+.
  2. You kindly want to contribute by hunting down those that are already on CRASH/COPA.
  3. Extra field in stats19::format to include data_entry_type

I just want to say: I am not 100% clear if there are different datasets released by the DfT according to their methods of collection. I think this is something that we need to clarify with DfT and right from the source. Otherwise (3) would be redundant.

The link in Ivo's post contains a link to this: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/744077/reported-road-casualties-annual-report-2017.pdf

layik commented 5 years ago

RE (2) the report full report does contain this image

Robinlovelace commented 5 years ago

I can think of the following ways to address this in the near term:

Sound like a plan? This should help reduce the chances of people arriving at false conclusions due to the different uptake times of CRASH. A PR adding an additional column to the existing police_boundary data, building on the var names shown below, would be greatly appreciated.

names(stats19::police_boundaries)
#> [1] "pfa16cd"  "pfa16nm"  "geometry"

Created on 2019-02-26 by the reprex package (v0.2.1)

My understanding of the switch is that it affects the serious/slight proportion but not the fatalities data. Is that correct? And any ideas how others are dealing with this?

In summary: definitely in favour of adding something on this, had heard about it but knew little about it. Thanks for raising the issue.

wengraf commented 5 years ago

While the app-based systems are markedly superior in principle, there are transition issues, and not everyone has taken it up or taken up the same system or even the same version of the same system. Serious are now much more precisely counted, because the app asks about injury type, whereas the paper form required you to remember the definition. The app-based methods should have much more accurate crash location data, but the processing so far hasn't been kind to casualty home location and driver home location fields. This should improve and be backdated (the right data is there in the computer, it just isn't spitting it out at the moment).

Your plan sounds excellent, @Robinlovelace , and I can have a word on the side about it at the next STATS19 review meeting at DfT if that'll help (i) clarify any issues and/or (ii) drum up further interest.

wengraf commented 5 years ago

My understanding is that the plan is that data entry method will begin to appear as an additional field, especially as new public-submitted data is likely to make this much more confusing soon.

Robinlovelace commented 5 years ago

Great to hear Ivo. Note: we have talked to DfT about this package and it has been informally tested by them (see #5). Anything mentioning those issues, especially based on expertise of the likes of Craig (do you know his GH handle? ; ) and others in Agilysis, will go well beyond mention of it in the current default open access system I believe! Look forward to seeing your input and if we can help in anyway (e.g. extracting data from an impenetrable pdf) just ping me here.

layik commented 3 years ago

Are we closing this?

Robinlovelace commented 3 years ago

No I think we need to get #176 and #178 before closing this.

Robinlovelace commented 6 months ago

Cc @stholder3 FYI