rcackerman / parole-hearing-data

http://www.parolehearingdata.org/
21 stars 12 forks source link

What do we do with duplicates? Thinking . . . #7

Closed nikzei closed 10 years ago

nikzei commented 10 years ago

Some people appeared multiple times in front of the parole board during the time captured by this data scrape. I'm wondering what the best way is to handle this.

Perhaps identifying those individuals who have appeared multiple times could be useful; we could say that # of people who appeared in front of the board during this 30 month period did so multiple times.

However, I think that I'd like to understand the number of people who appeared in front of the board, rather than only understanding the number of hearings conducted by the board. And in order to do this, I'd have to remove duplicates from the count (maybe based on something like "DIN number only gets counted once" etc.?).

rcackerman commented 10 years ago

Having the data in a relational database will address this problem. I'm going to close this issue for now.

nikzei commented 10 years ago

Super - thanks!!

On Thu, Aug 14, 2014 at 8:42 PM, Rebecca Ackerman notifications@github.com wrote:

Having the data in a relational database will address this problem. I'm going to close this issue for now.

— Reply to this email directly or view it on GitHub https://github.com/rcackerman/parole-hearing-data/issues/7#issuecomment-52263030 .