smartchicago / chicago-atlas

View citywide information about health trends and take action near you to improve your own health.
http://www.chicagohealthatlas.org/
155 stars 228 forks source link

rename upper and lower CI columns in three CDPH datasets #52

Closed derekeder closed 10 years ago

derekeder commented 11 years ago

@JamyiaClark and @RoderickJones, could you rename the following confidence interval columns to match the value column name for consistency?

In causes of death:

In Tuberculosis:

In Infant mortality:

JamyiaClark commented 11 years ago

I will work on this shortly.

From: Derek Eder [mailto:notifications@github.com] Sent: Wednesday, May 15, 2013 7:02 PM To: smartchicago/chicago-atlas Cc: Clark, Jamyia Subject: [chicago-atlas] rename upper and lower CI columns in three CDPH datasets (#52)

@JamyiaClarkhttps://github.com/JamyiaClark and @RoderickJoneshttps://github.com/RoderickJones, could you rename the following confidence interval columns to match the value column name for consistency?

In causes of deathhttps://data.cityofchicago.org/Health-Human-Services/Public-Health-Statistics-Selected-underlying-cause/j6cj-r444

In Tuberculosishttps://data.cityofchicago.org/Health-Human-Services/Public-Health-Statistics-Tuberculosis-cases-and-av/ndk3-zftj

In Infant mortalityhttps://data.cityofchicago.org/Health-Human-Services/Public-Health-Statistics-Infant-mortality-in-Chica/bfhr-4ckq:

— Reply to this email directly or view it on GitHubhttps://github.com/smartchicago/chicago-atlas/issues/52.

This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail (or the person responsible for delivering this document to the intended recipient), you are hereby notified that any dissemination, distribution, printing or copying of this e-mail, and any attachment thereto, is strictly prohibited. If you have received this e-mail in error, please respond to the individual sending the message, and permanently delete the original and any copy of any e-mail and printout thereof.

JamyiaClark commented 11 years ago

Good Morning Derek, Can you please provide clarification about the issues with the column headings? Are you referring to the tables in the portal? I checked those and there is not a problem with them.

Thanks, Jamyia

From: Derek Eder [mailto:notifications@github.com] Sent: Wednesday, May 15, 2013 7:02 PM To: smartchicago/chicago-atlas Cc: Clark, Jamyia Subject: [chicago-atlas] rename upper and lower CI columns in three CDPH datasets (#52)

@JamyiaClarkhttps://github.com/JamyiaClark and @RoderickJoneshttps://github.com/RoderickJones, could you rename the following confidence interval columns to match the value column name for consistency?

In causes of deathhttps://data.cityofchicago.org/Health-Human-Services/Public-Health-Statistics-Selected-underlying-cause/j6cj-r444

In Tuberculosishttps://data.cityofchicago.org/Health-Human-Services/Public-Health-Statistics-Tuberculosis-cases-and-av/ndk3-zftj

In Infant mortalityhttps://data.cityofchicago.org/Health-Human-Services/Public-Health-Statistics-Infant-mortality-in-Chica/bfhr-4ckq:

— Reply to this email directly or view it on GitHubhttps://github.com/smartchicago/chicago-atlas/issues/52.

This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail (or the person responsible for delivering this document to the intended recipient), you are hereby notified that any dissemination, distribution, printing or copying of this e-mail, and any attachment thereto, is strictly prohibited. If you have received this e-mail in error, please respond to the individual sending the message, and permanently delete the original and any copy of any e-mail and printout thereof.

derekeder commented 11 years ago

Hi Jamyia,

In order for me to programatically import all the confidence intervals for a given dataset, the table columns need to match a certain pattern based on the name of the value column.

A good example of this would be the blood lead dataset. The value column is named "Percent Elevated 1999" and the confidence interval columns are named "Percent Elevated 1999 Lower CI" and "Percent Elevated 1999 Upper CI". This, and all the other datasets follow this pattern when naming the confidence interval columns:

If you could rename the columns in the tables I mentioned to follow this pattern? It would not only make it easier to import, but be consistent for others to use. Hope this makes sense!

RoderickJones commented 11 years ago

Hi Derek, this issue now makes more sense. Jamyia and I discussed it. Your request is reasonable. In thinking about something like this that involves changing the format of portal tables, I need to also consider how it affects our internal processes (i.e., how we make the data), and our other constituents. We are now about 1 year into the process of making these tables available, and I am mulling over whether a "stacked" method of delivery is more appropriate than an "across" approach. What I've gathered is that the preferences of really data savvy users are different from those of our users who use the portal as a way of looking something up (cumbersome as it is).

What I would like to do as a next step is get a better understanding of how you bring data into your environment and what alterations you make to the formatting. I know this is code-heavy, but I would like to know the steps that occur to transform the tables. One of my thoughts (that would certainly be best for us in the short term) is that you might be able to rename things during that transformation process.

Related to this (and probably more important): We are not far away from updating some of the tables - mortality, infant mortality, comm area indicator summary, socioeconomic, and languages. As our process stands now, this in some cases results in a new set of column names (e.g., 2005-2009 replacing 2004-2008). How does this affect the ability of the Atlas to consume our data?

RoderickJones commented 11 years ago

Derek, We're close to being ready to update 4-5 of our datasets, but I'm concerned you are going to have problems with the changes. I need your input sooner rather than later on the post above.

To explain what's happening, I will use the infant mortality dataset as an example.

Current column headings are Community Area
Community Area Name Deaths 2004 Deaths 2005 Deaths 2006 Deaths 2007 Deaths 2008 Cumulative deaths 2004 - 2008
Average annual deaths 2004 - 2008
Average infant mortality rate 2004 - 2008
Rate lower CI 2004 - 2008
Rate upper CI 2004 - 2008
WARNING

Column heading changes in updated table Removed: Deaths 2004 Added: Deaths 2009 Rename: Cumulative deaths 2004 - 2008 to Cumulative deaths 2005 - 2009 Average annual deaths 2004 - 2008 to Average annual deaths 2005 - 2009 Average infant mortality rate 2005 - 2009 to Average infant mortality rate 2004 - 2008
Rate lower CI 2004 - 2008 to Rate lower CI 2005 - 2009
Rate upper CI 2004 - 2008 to Rate upper CI 2005 - 2009

What are the implications of these changes for the Atlas. Does this paralyze your processes and force a lot of extra work?

Although it is more long term, and would apply to all/almost all of our datasets, I am contemplating changing the layout of every table to a "stacked" format rather than "across" format. This would apply to infant mortality in this way: Add columns called 1) Time Period 2) Measure Populate Time Period with a year or span of years. Populate Measure with these values as appropriate: Deaths, Cumulative deaths, Average infant mortality rate,Rate lower CI, Rate upper CI

To reiterate, this last alternative is a much bigger challenge for me than I would be able to accomplish right away, but I need to have some input from you on this. If it would be useful to have a sample excel file of how this change from across to stacked is applied, let me know and I will get that you somehow. Eric

derekeder commented 11 years ago

@RoderickJones,

It will not be difficult at all on my end to update the data with the new 2005-2009 column headings. I've defined the way I access these columns in a way that is simple to change, as I expected them to be updated over time. Here's the code I use to import the data:

https://github.com/smartchicago/chicago-atlas/blob/master/lib/tasks/import.rake#L83

As you can see, each dataset has its own set of attributes that can be updated with a simple search/replace. To that end, I say go ahead with your updates.

Regarding the stacked vs across format, I'm not sure making the changes you described would make the data that much more accessible. The root of the issue is the source data has a slightly complicated data model and the data portal requires that you 'flatten' it into a spreadsheet format. I've seen this handled in many ways, from the stacking method you are using to having 10 different datasets. My recommendation would be to keep it in its existing format for now.

RoderickJones commented 11 years ago

Thanks for this reply, Derek - very helpful to me. Eric

RoderickJones commented 11 years ago

@derekeder, we are scheduled to have clearance to publish the 5 updated tables on Tuesday 6/4 or Wednesday 6/5 next week. Just a heads up. We haven't had the time to redo all the naming conventions you suggested, but it's on our list to consider for upcoming rounds.

derekeder commented 10 years ago

Data was updated by CDPH in June and confidence intervals are displaying properly. Closing.