washingtonpost / data-police-shootings

The Washington Post is compiling a database of every fatal shooting in the United States by a police officer in the line of duty since 2015.
Other
1.12k stars 516 forks source link

Age? #28

Closed BeaversDen closed 4 years ago

BeaversDen commented 4 years ago

Can we get ages or age ranges as well?

JohnJChristie commented 4 years ago

Please do not put age ranges in, just put in the actual age

BeaversDen commented 4 years ago

Please do not put age ranges in, just put in the actual age

I was asking for the purpose of statistics. There is other statistical data in this as well.

JohnJChristie commented 4 years ago

Please do not put age ranges in, just put in the actual age

I was asking for the purpose of statistics. There is other statistical data in this as well.

I understand what the range is useful for. What I'm saying is that the actual age is more useful. Age ranges are a crutch when you don't think that you can get compliance for actual age in surveys. The exact ages of all of these victims are available and should be included.

BeaversDen commented 4 years ago

But why not both? If my inquiry only calls for a range, why do I have to sift through a site that is data driven to get a range. There are several reasons to inquire a range.

On Wed, Jul 15, 2020 at 10:52 PM JohnJChristie notifications@github.com wrote:

Please do not put age ranges in, just put in the actual age

I was asking for the purpose of statistics. There is other statistical data in this as well.

I understand what the range is useful for. What I'm saying is that the actual age is more useful. Age ranges are a crutch when you don't think that you can get compliance for actual age in surveys. The exact ages of all of these victims are available and should be included.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/washingtonpost/data-police-shootings/issues/28#issuecomment-659125799, or unsubscribe https://github.com/notifications/unsubscribe-auth/AL5OOUT4CU4ANSQPIBOFKSDR3ZTONANCNFSM4OUW6FJA .

-- Glenn R. Beaver, Jr. Creative Director BeaversDen - General Media https://beaversden.blog

JohnJChristie commented 4 years ago

But why not both? If my inquiry only calls for a range, why do I have to sift through a site that is data driven to get a range. There are several reasons to inquire a range.

Both is not a solution for a data aggregator / provider when one is an arbitrary subset of the other. What if the provided ranges don't fall exactly into the classifications everyone who looks at the database wants? What if you don't get the age ranges you want? At that point the author of the database has spent a bunch of effort constructing something of no use to you. If you want a range make it out of the ages that you get.

Further, assuming the database has just so happened to classify the ages into ranges you like, I'm not sure how having only the range helps you find ages in a range much easier. You can sort the ages or age ranges and select all of the ones in a range. The "sifting" effort is identical in both cases. Or, if you do boolean searches "age =='15-25'" is trivially simpler than "age > 14 & age < 26".

There are whole chapters of books on why providing an age range, as opposed to age, is a bad idea but here are some issues.

Age is more informative. If you know the age then you know the age range. But the reverse is not true. I know that someone in 21-30 is younger than someone in 31-40. However, I have no idea if I'm looking at 21 v. 40 or 30 v. 31.

Age is more flexible. An age range would be whatever range the database author decides. If you have ages then you can select whatever ranges are appropriate for your report.

Age ranges are often used to obfuscate things. If someone doesn't like a pattern of outcomes to favour what they want to say about middle age, or elderly, or young, they just reclassify those things a little bit and voila, an arbitrary cutoff makes a finding do what they want. People need to be able to go back to the source and find out what the actual ages in a group are.

Age ranges cannot be added, multiplied, divided, etc. because they aren't numbers. They are just ordinal classifications. As such you're very limited in descriptive statistics of age ranges.

Age ranges only have two benefits. One is that you get more compliance on surveys that use a range. It's a gamble people take losing one kind of information in the hopes for another. But this isn't a survey and all ages would be in the data. The other is in summarizing information in reports. You may care about discussing particular ranges of individuals. But that's going to vary from field to field, time to time, and author to author. The author of a final can select age ranges at that time. There is no purpose to them in a database.

jmuyskens commented 4 years ago

Age is included in the data.