jsoma / data-studio-projects

12 stars 18 forks source link

[Project]Sorry, don't speak English. #216

Open Weihua4455 opened 6 years ago

Weihua4455 commented 6 years ago

Pitch

Okay that's a really bad headline. But I like it.

For my reporting class, I'm doing a story on court interpreter -- they are court-appointed translators, helping those who don't speak English to navigate the court system.

And there are A LOT of people in New York City who don't speak English. Or at least not well enough to understand what's happening in court.

Using census data, I want to create a map that shows which neighborhood in New York City has the most non-English speakers.

Summary

There are lots of good maps that show what language New Yorkers speak.

Like this one, or this one -- it used D3 and looks truely amazing.

But my focus is different. I'm less interested in what language people speak than what's the percentage of population that doesn't speak English.

Luckily, the American Community Survey tracks this information. There are three parts to language proficiency:

a. Does this person speak a language other than English at home? b. What is the language? c. How well does this person speak English?

These questions were first introduced to the Decennial Census in 1890, then transfered to ACS in 2005, as ACS replaced the Decennial Census.

And here is how the Census Bureau defines a "limited English speaking household":

A "limited English speaking household" is one in which no member 14 years old and over (1) speaks only English or (2) speaks a non-English language and speaks English "very well."

Details

Possible headline(s):

73% of households in this New York City neighborhood don't speak English

Limited English Speaking Households in NYC

Data set(s): https://factfinder.census.gov/

https://www.census.gov/geo/maps-data/data/cbf/cbf_tracts.html

Code repository:

https://github.com/Weihua4455/data_studio/tree/master/code/03_nyc_language

Possible problems/fears/questions:

Oh where do I start?

1) I want to somehow group these data by counties/boroughs on the map, but I'm not sure how to organize it.

2) The census data also includes the percentage of non-English speaking families that speak a certain language ... i.e. 30% speaks Chinese, 40% speaks Spanish, etc. I want to include that information on the map as well, but I don't want to make it a scavenger hunt for readers. Perhaps I can create a dropdown in leaflet? Choose Spanish and you'll see the percentage for Spanish-speaking families only.

3) Is choropleth map the best form for this project? I'd love to hear some feedback. Don't be shy.

Work so far

I got census data of limited English speaking household, joined the csv with shapefiles provided by the Census Bureau, filtered for five boroughs in New York City, and created a choropleth map based on percentage of households that doesn't speak English.

Simply, right? Except it looks like trash.

wechat screenshot_20180724205309

Oh I didn't even add a legend. Basically yellow are neighborhoods that has less than 10 percent non-English speaking households, dark blue is higher than 70 percent. You get the gist.

Checklist

This checklist must be completed before you submit your draft.

jsoma commented 6 years ago

A common way of approaching this that's pretty fun is "what's the most popular language that is not English in this area?" or "what's the most common language that's not English or Spanish?" (that's what Jill did in one of the examples you linked). I honestly think it would be nice to recreate that one, but make it more of a story instead of just "i dumped a bunch of colored blocks on a map." When I look at it, and it has buttons, a huge legends, and blah blah blah, I just get overwhelmed! I want to be told what's interesting - there are Koreans here, there are Africans here, take me on a tour through a series of maps and graphs and text!

I think the best route is to...

  1. Have a dataframe with census areas as your rows, then number of people speaking each language as your columns (although they'll probably be encoded instead of have nice names). This should be the format from the census, I think!
  2. Follow the instructions on this SO answer to find which column is the biggest for each row. You'll want to drop English (and maybe Spanish, if you care).
  3. Merge with the column definitions if it's still weird codes
  4. Make your maps!

If you wanted to do this with actual neighborhoods, you'd need to do a spatial join in QGIS between the census tracts and the neighborhoods (since the census doesn't know neighborhoods) to sum up the people inside before you start from Step 1.

jsoma commented 6 years ago

Perhaps I can create a dropdown in leaflet? Choose Spanish and you'll see the percentage for Spanish-speaking families only.

Don't make your user do the work, that's your job! Pull out a story, tell them the story, have a graphic or two. Find another interesting place, tell them a story of that place, have a graphic or two.

adrianblanco commented 6 years ago

Good idea! I will follow Soma's advice. Choose the graphics (in this case maybe zoom in the map) that will explain your story better. In terms of the map, I think it is very important that you choose an appealing and smooth color scale. Also, depending on the frame of your story, maybe you should consider to remove Spanish also. If not, I guess there are going to be lots of neighbourhoods where Spanish if the first language.

angelareplica commented 6 years ago

Fun topic! Good feedback above. It could be interesting if you were able to show nationalities along with languages. Even if Spanish dominates many neighborhoods, they might be characterized by immigrants from different countries.

(Maybe it's the scale, but what happened to the East River in the choropleth above!? Haha.)

If you want to move away from a map, I think it'd be really cool to find a way to visualize ALL the languages spoken in NYC. There's a small community of Gottschee Germans in my neighborhood, for instance -- they speak a dialect that's only spoken by a few thousand people around the world.

Excited to see how this turns out!

Weihua4455 commented 6 years ago

Update

Your project content: images/words/etc

Many apologize for the late update, and thank you guys so much for all the great feedback.

I went back and analyzed the data, here's what I got so far.

Main spoken language -- including English and Spanish

image

Main spoken language -- including Spanish

image

Main spoken language -- excluding English and Spanish

image

Any changes in direction or topic?

Not necessarily.

Problems/Questions

This is more of a checklist for the next 4 days:

  1. Pick a color scheme that doesn't hurt the eyes

  2. Decide if I want to graph by census tract or neighborhood -- I think the first option will give me more details, while the latter shows a larger trend, plus it'll look cleaner.

  3. Find a way to tie the type of language to the number of people who don't speak English. I.e. people in census track xxx mainly speak Spanish, and x% of them doesn't speak English -- because my article is about court interpreters. I'm not really sure how to show it in the map without making it look super messy. Perhaps I can have another chart to plot 10 census tracts that have highest Limited English Proficiency population?

  4. Identify stories / areas that I want to zoom in.

Checklist

jessimckenzi commented 6 years ago

Well I think the improvements are spot on, you did a great job following Soma's suggestions. Something that's lost for me, and I don't know if you mind or not, is that I really was interested in seeing where the majority of residents (sometimes more than 70 percent!!) don't speak English. I think that's really interesting, especially because I feel like I haven't seen that info before. Maybe you can work a version of that into the final story?

I'm sure this is already on the docket but the colors for languages change from map to map so when you decide on a theme make sure they're consistent. Also, maybe it would look nice if you got rid of the black strokes dividing neighborhoods/census tracts, because those can be rather arbitrary, and let the color blocks speak for themselves?

Interesting story ideas could be choosing a language and explaining why it crops up in different places? Like the original Chinatown in Manhattan has been shrinking, but what was it that drew Chinese-speakers to South Brooklyn and to Queens? Is that something you could track over time if you pulled in census data from 10, 20, or 50 years ago? Do you have data on the languages that court interpreters can speak? Could compare representation in the courts system to representation on the map....? Just throwing spaghetti at the wall here.

sarahslo commented 6 years ago

i think this is a case for small multiples. and to think through the lens of 'where' a little. (also, be wary of areas with no people, no on is speaking any language in central park!)

if you start with your map that shows the limited English speaking households, and then you tell us in each borough (where) are the highest concentrations of limited ESH's, you can then do small maps that show us, in those neigborhoods - the top 3 places where people don't speak english - what DO they speak.

i see the russian speakers in brooklyn...in queens i think it's chinese...the bronx it's spanish... instead of putting all the languages on the map, simplify it and make cute tiny maps to go with the big one!