jsoma / data-studio-projects

12 stars 18 forks source link

[Project] Human Fecal Bacteria in NYC Beach Water :') #200

Open angelareplica opened 6 years ago

angelareplica commented 6 years ago

Pitch

Summary

NYC's Department of Health & Mental Hygiene makes regular updates to their data set on beach water quality. In particular, they test their samples for enterococci, AKA indicators of the presence of human waste/fecal material (and possible disease-causing bacteria as a result). I'd like to know which nast ass NYC beaches to avoid this summer (and also the rest of my life).

Details

Possible headline(s): Which Shit-Encrusted NYC Beach Should You Swim at This Summer? Staten Island's Beaches Are Literally Filled With Human Shit (Sorry)

Data set(s): https://data.cityofnewyork.us/Health/DOHMH-Beach-Water-Quality-Data/2xir-kwzz

Code repository: https://github.com/angelareplica/data-studio/tree/master/code/03-nyc-shit-beaches

Possible problems/fears/questions: Finding the best way to visualize and convey the data.

Work so far

After acquainting myself with the data set, I looked up EPA recommendations for marine water recreation, as well as NY state and city health/sanitation code.

Made a chart of all the 2018 samples that have fecal bacteria counts exceeding the New York State Sanitary Code and the NYC Health Code for marine water. (This looks terrible right now, but I'll clean this up and annotate in Illustrator for my next revision.) fecalhotspots

I also wanted to look at some of the NYC's most popular beaches, like Coney Island and the Rockaways. I made a rudimentary bar chart showing the average counts for samples collected in 2018. Trying to figure out a better way to visualize this. (Averages don't seem ideal, since samples can differ drastically -- and a regular bar chart looked bad, since a number of samples had counts of 0 -- or below the detection limit.) Will also be cleaning this one up in Illustrator for my next revision. coneyrockawayavg

Checklist

This checklist must be completed before you submit your draft.

pasiegrist commented 6 years ago

Hi Angela,

thanks for this service to the public :)

angelareplica commented 6 years ago

A lil update:

Update

Your project content: images/words/etc

screen shot 2018-07-27 at 10 54 50 am

coneyrockaway

Any changes in direction or topic?

Nope

Problems/Questions

Hmm... Will keep working on the charts above to clean them up & annotate. Not sure what else to do with this data or what to visualize, though! Will definitely consider the feedback above.

Checklist

troboukis commented 6 years ago

I don't understand what is the meaning of each colour. What is MPN? I don't like the Upper case letters at the axis. I think you should change them to lower case. Also the title should be the same size on your graphs! Also, is 100 enterococci per sample a problem? Is it too much? Some annotation might help.

vpenney commented 6 years ago

Great topic! Definitely one that I prefer to not think about, but hey, we need to know, right?

I like how you've grouped the counts together on your bar chart. It might be helpful to add dates there, so the reader can get a sense of how often those beaches are filled with shit. Is the data just for one year?

I'd also be curious as to how often are samples being taken at these beaches. We've all heard that we shouldn't go to the beach after it's rained a lot, so it would be interesting to see if the city avoids taking samples after rain storms--DarkSky has historical data in their API, right? It would be a bit of work, but it could add an interesting element to the project if you find yourself with a lot of free time.

Visually, I love your second chart and the colors that you chose, but I think the colors can be a little confusing. Maybe it would help to put some space between the Rockaways and Coney Island to show that the colors represent two different locations? Also, here's the pandas documentation to change words to title case--it's super quick.

I'm really interested in how many times the enterococci count is juuuuuust under what it would have to be to close a beach to the public. I feel like if there's a big weekend (Fourth of July) or a lot of pressure from the public (like if a triathlon is going on), officials might be tempted to fudge their numbers a little bit to avoid public outcry.

angelareplica commented 6 years ago

Update 2

Your project content: images/words/etc

I made a heatmap showing average fecal bacteria count by month in 2018 to supplement the charts above (which I still need to make prettier). It looks terrible at the moment.

heatmap

Including July (with the updated dataset I mention below): heatmap2

Any changes in direction or topic?

Nope.

Problems/Questions

The Department of Health & Mental Hygiene just updated their data-set so that it includes July data. I have decided to re-do everything -- all of the above charts. I am sad (mostly about the extra Illustrator work). I hope it's worth it!

Checklist

julialedur commented 6 years ago

Hey! I loved your idea and it's definitely something New Yorkers would be super interested in knowing more about. The heatmap looks really good and it's really easy to read. Maybe you can add some annotations to it for the really dark values. Did something unusual happen at that beach on that month to make the numbers go up that much?

I love the color scheme you used on our heatmap, I think it really fits the topic. Maybe you can apply the same colors to your other graphs.

In your second graph, the one about the samples, it's not so clear what each dot means. Are they samples? You might want to add a little legend explaining it. Also, it could be cool to make "The Rockaways" and "Coney Island" have the same text color as the dots representing them. That would make the graph even clearer.

Great job! I'm looking forward to seeing the next version! 😄

sarahslo commented 6 years ago

um wow. never seen this dataset.

so i'd like a map here even if it's just to show me the beaches. and i'd like to see the heat map divided up by place/beach the way you did at the top. then organized most to least. so coney island, rockaway best to worst.

or you could do some small versions of the heat map and sort it different ways, and give us a line of text over each so we see what it is. sort it by the worst june, july and august.

organize it by beaches, best to worst. lots of ways here to play with this in small multiples or to show us different things.

and the numbers on the bottom, should be months yes?

angelareplica commented 6 years ago

Final

Project visuals/text

fecal_hot_spots_edited_udpated2

coney_rockaways_2_edited

beach_yearly_heatmap_edited

Douglaston Homeowners Association: douglaston_edited

Details

Headline: These NYC Beaches are Literally Full of Human Shit Alternate headline: When Fecal Bacteria Colonizes New York City's Beach Water

Published website version: https://angelareplica.github.io/ds-nyc-beaches/

Code repository: https://github.com/angelareplica/data-studio/tree/master/code/03-nyc-shit-beaches

Final data set(s): https://github.com/angelareplica/data-studio/tree/master/code/03-nyc-shit-beaches

What did you find to be the most difficult part of this project?

Finding things to focus on, and interpreting the data.

Are you satisfied with what you produced? Is there anything you would like to change or improve?

I'm satisfied, but my heatmap needs a lot of work. I'm still trying to wrap my mind around Seaborn, but I might take another stab at it later on. I pitched this somewhere, but before I cleaned up my graphs and did more analysis... so we'll see how that goes.

Checklist