jsoma / data-studio-projects

12 stars 18 forks source link

Friday news dump -- Is it still a thing? #131

Open Weihua4455 opened 6 years ago

Weihua4455 commented 6 years ago

Please complete all of the following sections, or the ghost of Joseph Pulitzer will spookily dance around your issue! A completed version of this template can be found at https://github.com/jsoma/data-studio-projects/issues/1

Pitch

Summary

I want to look into the so-called "Friday news dump" phenomenal, where press secretaries dump all important news on Friday afternoon, hoping that each news will get less coverage.

My data came from the website of Trump White House and Obama White House. I will scrape all the statements and press releases, then see if they do indeed publish more stuff on Fridays.

I also want to do some text analysis of either the title or the content of those releases. Perhaps some words show up more often on Friday than on every other day.

Details

Possible headline(s):

Data set(s): https://www.whitehouse.gov/briefings-statements/, https://obamawhitehouse.archives.gov/briefing-room/statements-and-releases

Code repository: https://github.com/Weihua4455/data_studio/tree/master/code/01-friday-news-dump

Possible problems/fears/questions: I am not sure what's the best way to show my data -- should I group by year/month/week/term? Should I just choose the most significant ones (i.e. max, min, etc) and do an analysis of them?

Work so far

I got my data and played around with it. Without knowing how datetime really work (now I do yay), I ploted data for both administration with bar graphs. X is day of week, Y is percentage of press releases that were published on that day.

Here are my graphs: obama

trump

Checklist

This checklist must be completed before you submit your draft.

kidaemon commented 6 years ago

The charts are visually very good and also persuasive. But Trump changed the communication strategy by using twitter. So I'd like to know his twitter activity.

Weihua4455 commented 6 years ago

Update

Your project content: images/words/etc

After Wednesday's class, I scraped for the content of each press releases for Obama's eight years in office as well as Trump's 18 months. Then I tried to conduct a text analysis to see if some words appeared more in Friday's releases than other days.

The exact steps are:

1

It was a good experiment, but the results are not that significant.

2

Before jumping into that rabbit hole, I decided to change strategies and focus on time analysis: how are Friday news dumps change over time in each White House?

First I used .resample and .unstack() to plot both administrations, this is what I got:

9

Ok it was a mess. Then I plotted only the Fridays.

3

Since the x-axis is datetime, naturally pandas plotted Trump WH's data after Obama's. Looks interesting, though not useful. Then I created a new column called "in_office_for", basically, it's the datetime info of press release minus datetime of that president's inauguration date.

4

It allows me to plot two dataframes on top of each other, the x-axis is how many days they were in office.

5

Still they are two problems:

1) I left markers in the graph because I thought later I can point to some of the spikes and wrote some context, but it made the graph really messy, and I don't know how to clean it. I'm not even sure if a line graph is the best way to show differences.

2) After Friday's class, I thought a lot about what Maryanne said about bar v.s. line graphs. She mentioned that if we choose to show changes with line graph, then it's implied that the data/change is continous, but that is no the case here. The data I'm plotting -- in each month, what percentage of press releases are published on Friday -- are not changes that took place over time. They are rather independent from each other, therefore perhaps bar graph is the better way.

And that's what I did. I choose to plot only the Trump WH. Since there is less information, I can resample to week (instead of month) and still have a graph that's not that busy.

This is what I got:

6

Then a thought came to me: what if I also color each bars depending on its value? There are five workdays in each week, so in a perfect world, the White House should announce 20 percent of its news on each day. But when they publish twice that amount on Fridays, perhaps it's worth looking into it and see what exactly they are dumping.

Oh, because I am me and I can't design, I stole some colors from The Upshot.

7

Then I added a really ugly headline and one somewhat-lined-up legend in Illustrator.

8

Any changes in direction or topic?

Not necessarily. Although I'm really debating whether analyzing word frequency is necessary, or even a good way, for this project. If I have more time, I would rather look into some of the outliers -- max, min -- and dive into what was happening during those weeks.

Problems/Questions

How to do anything No, really. Scraping was easy, but I struggled a lot with finding the right form to present the dataset and to tell the story.

Checklist

collleenwang commented 6 years ago

The comparison is very interesting between Obama and Trump, for more clear info, I think display each of their info is a wiser choice.

ella24 commented 6 years ago

I believe this last graph actually tells a story. Quite an interesting one indeed. What I recommend is for you to change the background to black or use more intense colors in the graphs. Also, in illustrator, try to wrote down something sharp about what happened those six Friday where the press releases reached the 40% of importance.
Also, try to do it too with Obama. But in a complete different graph. You can do this!

playfairbot commented 6 years ago

Greetings! I'm a little robot, beep beep boop boop.

You need some feedback, let me summon @julialedur, @adrianblanco, @linleysanders for you

adrianblanco commented 6 years ago

The vertical bar graphs are cleaner than the previous ones, great job! It would be great to annotate them with the main events that happened in the most relevant weeks! Then, we will be able to see what's behind some Friday news dumps

julialedur commented 6 years ago

Your idea is fascinating and the graph is visually pleasing! Good job. Like @adrianblanco, I would really enjoy annotations with relevant events on some of the bars. As improvements, I suggest:

linleysanders commented 6 years ago

This is such a fascinating project, and you've done a wonderful job really diving into the web scraping and text analysis. I think the best charts here are the last one (showing news dumped per week, as well as the first two with the comparison of Obama to Trump). After reading the other commentary, I concur and do not have anything additional! I love the idea of adding the main news events surrounding the spikes.

Weihua4455 commented 6 years ago

Final

Project visuals/text

This is the final graph I made:

image

The rest are on the website.

Details

Headline:

Friday News Dump: Still a Thing?

Published website version:

http://www.weihua4455.github.io/01_friday_news_dump

Code repository:

https://github.com/Weihua4455/data_studio/tree/master/code/01-friday-news-dump

Final data set(s):

https://www.whitehouse.gov/news/

https://obamawhitehouse.archives.gov/briefing-room/statements-and-releases?term_node_tid_depth=40&page=1

What did you find to be the most difficult part of this project?

Telling a story with the data. I ended up having to do a lot of research and writing (which was fun and I'm not complaining), only to find that the data doesn't really tell a story.

Are you satisfied with what you produced? Is there anything you would like to change or improve?

No. I've cried under a blanket for more than once.

If I could start over, I would analyze the white house briefings, instead of the press releases. The way I measure "news dump" is by the sheer amount of press releases, but not all press releases are created equal, and their important & newsworthiness are not solely in the number.

Checklist

sarahslo commented 6 years ago

so this is an interesting question you are asking and looking for data for. the first chart you did, i would have kept is in order of the weekdays, too confusing to search around by day.

i don't think the second chart you have here tells us anything. it's hard to read it as presented. why not simply make two fever lines on top of each other over time so we can see the trend. also, i'm not sure just looking at fridays helps, you need to see if it's more or less than the rest of the week.

one way to make this would be to heat map it. you can even do this with conditional formatting in excel, copy and paste the chart into illustrator

screen shot 2018-07-24 at 7 52 43 pm

Weihua4455 commented 6 years ago

Hi @sarahslo,

Thank you so much for your suggestions!! I put something together using a python library called calmap, and this is what I got:

Trump's first year in office: image

Obama's first year in office: image

I'm still trying to clean it up in illustrator and, among other things, trying to figuer out a way to add lines that separate months. Though I agree with you. in this case, a heatmap gives a lot more information than simple bar graphs.

That said, I'd love to hear what you think! (Or what anyone thinks, for that matter)