jsoma / data-studio-projects

12 stars 18 forks source link

[Project]All the M.S. Master's Projects Ever #250

Open Weihua4455 opened 6 years ago

Weihua4455 commented 6 years ago

Pitch

Summary

I need to think about my Master's Project -- think hard -- because the first project memo is due Oct. 16th. It seems like a long time, but if I learned anything from spending a summer with the Lede, that's time flies.

Now the good thing is that the JSchool publishes an index for each year's Master's Projects -- from 1995 (when the M.S. program started) to 2017. The index includes the author's name, headline, format (print, radio, etc), and the advisor's name.

So I scraped all the indexes. Hopefully, I'll find something inspiration and something interesting.

Details

Possible headline(s): "Here is what Columbia Journalism School students wrote about"

"Ah, words."

Data set(s):

https://library.columbia.edu/locations/journalism/masters.html

Code repository:

https://github.com/Weihua4455/data_studio/tree/master/code/04_jschool_masters_projects

Possible problems/fears/questions:

1) I want to make something like this: choose 10 all-time most frequent words in the headline, then use area-chart or whatever makes sense to plot when they appear. The only thing is that I don't know how/what's a more elegant way to do that.

2) Two columns that I'm not using in the following charts are the author's name and advisor's name -- both are important. For the advisor column, I can clean it up and see which JSchool prof is the most popular one when it comes to master's project. But any suggestions on how I should use the author column? It essentially is a roster of every single person graduated from the M.S. program. Maybe cross that with other databases?

Work so far

image

The longest headline in JSchool's history:

The Perfect Muslim: Beneath The Hijab. Blending Religion And Style. A Look At How Many Muslim Women Feel About Wearing The Hijab, And Gaining More Representation In Mainstream Fashion. No Doubt A Woman'S Hijab Is An External Sign Of Her Religion, But The Strength Of Her Faith Comes From Her Inner-Self

image

Most frequently-seen words in headlines:

1995 - 2005 image

2005 - 2015 image

2015 - 2017 image

image

Checklist

This checklist must be completed before you submit your draft.

sarahslo commented 6 years ago

i really want you to remove 'new york city' from the word rankings. and then do something with color to show us the repetition - connect all the usage of women, highlight when black is used, when gay arrives.

don't think the names of all the students do anything for anyone, nope. nothing there. nice work.

vpenney commented 6 years ago

I feel you--I also need to be pinning down a thesis topic. Also, that really long title is terrible. Find out which thesis advisor approved that.

I agree that the students' names aren't really relevant. Gender of the author could be interesting to look at, but you'd have to make a lot of assumptions working just off of names, since many names can be gender-neutral.

The area chart may actually be clearer as a stacked bar chart per year, or even a scatter plot, where you assign a differently-colored circle to each year.

I'm also interested in how "trendy" the M.S. journalism students are. Do thesis topics follow the same theme as current events, like the Vietnam War, the Iraq War, the civil rights movement, etc? Also, I wonder if some of the words would make more sense if you could capture the rest of the phrase. What does "east" refer to, for example? Maybe try grabbing one word before/ after adjectives using regex?

I'm excited to see what you find!

pasiegrist commented 6 years ago

Hey Weihua

I think it is a nice idea to look at the master's projects, but I agree: you should pin down a thesis and definitely lose New York York in the ranking as this is really distorting everything. As your information about the project is reduced to the name and the title I would go with something like gender, even though you'd havve to make a lot of assumptions.