jsoma / data-studio-projects

12 stars 18 forks source link

Healthcare Violations #169

Open tsp2123 opened 6 years ago

tsp2123 commented 6 years ago

Pitch

Summary

I found a dataset of industry violations tracked by Watchdog group Good Jobs First. I'm looking at the Healthcare Industry to ask the following questions:

Which company has the most violations?

Which year did these violations occur?

Do some US States have more violations than others?

What was the primary reason for these violation?

Details

Possible headline(s):

Data set(s): https://violationtracker.goodjobsfirst.org/prog.php?major_industry_sum=healthcare+services

Code repository:

Possible problems/fears/questions:

Work so far

Here we can see that certain companies have excessively more violations than others:

screen shot 2018-07-17 at 1 34 50 am

The Top Two seem to be Kaiser Permanente and American Medical Response. I'm still trying to work my data, but the following graph shows me trying count how many violations per year. It's not working out right now with the dataset sorting itself for whatever reason.

screen shot 2018-07-17 at 1 35 01 am

Checklist

This checklist must be completed before you submit your draft.

playfairbot commented 6 years ago

Hello! I'm a little robot, let's take a peek.

Please post your first revision! It should be posted by Thursday at midnight. More details available here.

You need some feedback, let me summon @SimoneLuc, @SiruiZhu, @jessimckenzi for you

It looks like we need to fix up your pitch a little bit! Edit it by clicking the pencil in the top right-hand corner. It requires:

jessimckenzi commented 6 years ago

Interesting data! I'm unclear what's being counted here, but 13 violations doesn't seem like that many more than 5, without context. As for the styles, the colors are really aggressive, and it appears as though you're trying to draw attention to the red ones, even though it doesn't look like those are special in any way. And what's the 3 in the second graph? These are (apparently) very small numbers you're working with—why are they notable?

SiruiZhu commented 6 years ago

I like this idea!! My thoughts would be:

tsp2123 commented 6 years ago

Update

Sorry about the late post. Here's some updates

Your project content: images/words/etc

screen shot 2018-07-22 at 11 15 19 pm

screen shot 2018-07-22 at 11 15 29 pm

screen shot 2018-07-22 at 11 15 37 pm

screen shot 2018-07-22 at 11 15 49 pm

Any changes in direction or topic?

I realized my initial analysis wasn't as in depth and I'm still in an exploratory phase. My next steps are to look deeper into the biggest violators by company

Problems/Questions

I am having an initial problem when I want to make a bar graph of violations by year. For whatever reason, the graph decides to sort itself whereas I need the X axis to be chronological

Checklist

tsp2123 commented 6 years ago

Update

Your project content: images/words/etc

Here are some revisions to my old graphs.

healthcare_violations_years

cumalative_violations

Any changes in direction or topic?

I've changed some of the aesthetics of this graph. I'm still getting a hang of illustrator and trying to fit things into an art board template

Problems/Questions

I've been trying to find ways to dig deeper into the companies but I'm not confident in the dataset's collection. For example the info links dont like to much more information that what's already present in the dataset and there are no links to any PACER case files so while you can tell the number of violations and the addresses at which these violations happen you cant tell much else about it : (

The other question I've reached out for an explanation—but related to the lack of data clarity—I'm not sure how Wage and Hour violations are different from Labour Violations and whether that's even a worthy distinction for my audience. If I consider them together, it would change my dataset greatly, and I'm not sure whether that's worth doing.

Checklist

sarahslo commented 6 years ago

if you have a chart that has a scale you don't also need numbers on it. and watch out for repeating something, 'violations' should be a label once, not repeated.

i find this dataset problematic. partly because it's from a watchdog group, and that makes me wonder what is their agenda. but partly because the numbers are so small.

the fact that certain companies have more violations than others, some are bigger than others and handle more clients.

these data are not strong enough to make charts out of. to be honest, i would scrap the data entirely. once you harden something into a chart it makes it appear to be fact and in this case, its misleading. your instinct to not be confident about the numbers, go with it.

ella24 commented 6 years ago

I was confused about the graph Healthcare violations by year'. The subtitle mentioned that they have been increasing every year, however, we could see some periods where they started to lower, especially after 2015. I approve that you changed that subtitle, but I encouraged you to use annotations to explain what was happening in those periods. Yes, that is going to be another research, but a quite revealing one, since your data is not that broad (specify the source in the graph). It is great that you decided to use just one color for the cumulative violations, the first version was quite confusing. However, I do not think that the bar graph is the best one to use here. Maybe pie? I know Soma does not like them, but in this topic, I think they will communicate faster and punctually. To compare companies, it would be nice to have more information about their available resources and size. This way we can compare more accurately.

hakantan commented 6 years ago

I would love it to see what a "False Claims Act" actually is.

On one of your final charts the term "violations" is mentioned 12 times. I would use it just in the headline and as an axis-label. Instead of rotating the years (x-axis), maybe it would make sense to have fewer years on there?

Also, since we're talking about 140 violations in 2016, my immediate questions are:

  1. Why is this number so low?
  2. What is "low"? (How does it compare with at least one different industry.)

Meaning: I would need the numbers to be explained.

tsp2123 commented 6 years ago

Final

Project visuals/text

healthcare_violations_line_final

worst_healthcare_provider_2016_final

top_5_2016_final

fines_2016_final_c

Here are my updated visuals! And here's the link to what I put in html. (For whatever reasons these aren't uploading but hopefully the hyperlink works and you can view them there)

Details

Headline: Don't Put Grandma in a Nursing Home

Published website version:

https://tsp2123.github.io/projects.io/Project/html%20project/index.html

Code repository: https://github.com/tsp2123/data-studio/tree/master/Project_2 Final data set(s): ''''

What did you find to be the most difficult part of this project?

This dataset was difficult to produce a data story out of—I think it's one of those sets that is useful for exploratory analysis, for example, the fact that there is a shit ton of issues with nursing homes seems to be replicated in this dataset and reflects the more anecdotal news of nursing home's being rife with violations, but the dataset doesn't take into consideration other important factors about the business involved such as the size of the business, its market share, etc. So as Sarah pointed out it isn't conducive to the best results and may lead to certain outcomes that aren't entirely reflective of the situation.

Are you satisfied with what you produced? Is there anything you would like to change or improve?

I'm not to keen on this project. Again, it's really just a test project for me querying datasets, but other than that this isn't really for publication. It's simply a test.

Checklist

christina10211 commented 6 years ago

Hey! Congratulations on completing your project! I took a look at your website and I would suggest to adjust the scale a little bit. I figured that you are using the size A for three of your graphs, and I feel like this might not be the perfect size for your charts since I can't read the labelling very clearly, and the font size would better be standardized to create a coherent look.

A minor suggestion on your first graph: maybe you can make your x-axis tick size smaller :)