Open Weihua4455 opened 6 years ago
The charts are visually very good and also persuasive. But Trump changed the communication strategy by using twitter. So I'd like to know his twitter activity.
After Wednesday's class, I scraped for the content of each press releases for Obama's eight years in office as well as Trump's 18 months. Then I tried to conduct a text analysis to see if some words appeared more in Friday's releases than other days.
The exact steps are:
.value_counts
to get word frequencyIt was a good experiment, but the results are not that significant.
Before jumping into that rabbit hole, I decided to change strategies and focus on time analysis: how are Friday news dumps change over time in each White House?
First I used .resample
and .unstack()
to plot both administrations, this is what I got:
Ok it was a mess. Then I plotted only the Fridays.
Since the x-axis is datetime, naturally pandas plotted Trump WH's data after Obama's. Looks interesting, though not useful. Then I created a new column called "in_office_for", basically, it's the datetime info of press release minus datetime of that president's inauguration date.
It allows me to plot two dataframes on top of each other, the x-axis is how many days they were in office.
Still they are two problems:
1) I left markers in the graph because I thought later I can point to some of the spikes and wrote some context, but it made the graph really messy, and I don't know how to clean it. I'm not even sure if a line graph is the best way to show differences.
2) After Friday's class, I thought a lot about what Maryanne said about bar v.s. line graphs. She mentioned that if we choose to show changes with line graph, then it's implied that the data/change is continous, but that is no the case here. The data I'm plotting -- in each month, what percentage of press releases are published on Friday -- are not changes that took place over time. They are rather independent from each other, therefore perhaps bar graph is the better way.
And that's what I did. I choose to plot only the Trump WH. Since there is less information, I can resample to week (instead of month) and still have a graph that's not that busy.
This is what I got:
Then a thought came to me: what if I also color each bars depending on its value? There are five workdays in each week, so in a perfect world, the White House should announce 20 percent of its news on each day. But when they publish twice that amount on Fridays, perhaps it's worth looking into it and see what exactly they are dumping.
Oh, because I am me and I can't design, I stole some colors from The Upshot.
Then I added a really ugly headline and one somewhat-lined-up legend in Illustrator.
Not necessarily. Although I'm really debating whether analyzing word frequency is necessary, or even a good way, for this project. If I have more time, I would rather look into some of the outliers -- max, min -- and dive into what was happening during those weeks.
How to do anything No, really. Scraping was easy, but I struggled a lot with finding the right form to present the dataset and to tell the story.
Should it be a comparison of two administration? Comparison over week/month/year/term? It is even fair or meaningful to compare the two? Should the comparison be two times that represent changes of percentages over time? Or perhaps sets of two bars that are side-by-side?
Maybe comparison will make more sense if I also get press releases from the Bush and Clinton administration?
Or perhaps it makes more sense to focus only on one administration and really take a closer look at what the data is telling us?
The comparison is very interesting between Obama and Trump, for more clear info, I think display each of their info is a wiser choice.
I believe this last graph actually tells a story. Quite an interesting one indeed. What I recommend is for you to change the background to black or use more intense colors in the graphs. Also, in illustrator, try to wrote down something sharp about what happened those six Friday where the press releases reached the 40% of importance.
Also, try to do it too with Obama. But in a complete different graph. You can do this!
Greetings! I'm a little robot, beep beep boop boop.
You need some feedback, let me summon @julialedur, @adrianblanco, @linleysanders for you
The vertical bar graphs are cleaner than the previous ones, great job! It would be great to annotate them with the main events that happened in the most relevant weeks! Then, we will be able to see what's behind some Friday news dumps
Your idea is fascinating and the graph is visually pleasing! Good job. Like @adrianblanco, I would really enjoy annotations with relevant events on some of the bars. As improvements, I suggest:
This is such a fascinating project, and you've done a wonderful job really diving into the web scraping and text analysis. I think the best charts here are the last one (showing news dumped per week, as well as the first two with the comparison of Obama to Trump). After reading the other commentary, I concur and do not have anything additional! I love the idea of adding the main news events surrounding the spikes.
This is the final graph I made:
The rest are on the website.
Headline:
Friday News Dump: Still a Thing?
Published website version:
http://www.weihua4455.github.io/01_friday_news_dump
Code repository:
https://github.com/Weihua4455/data_studio/tree/master/code/01-friday-news-dump
Final data set(s):
https://www.whitehouse.gov/news/
Telling a story with the data. I ended up having to do a lot of research and writing (which was fun and I'm not complaining), only to find that the data doesn't really tell a story.
No. I've cried under a blanket for more than once.
If I could start over, I would analyze the white house briefings, instead of the press releases. The way I measure "news dump" is by the sheer amount of press releases, but not all press releases are created equal, and their important & newsworthiness are not solely in the number.
so this is an interesting question you are asking and looking for data for. the first chart you did, i would have kept is in order of the weekdays, too confusing to search around by day.
i don't think the second chart you have here tells us anything. it's hard to read it as presented. why not simply make two fever lines on top of each other over time so we can see the trend. also, i'm not sure just looking at fridays helps, you need to see if it's more or less than the rest of the week.
one way to make this would be to heat map it. you can even do this with conditional formatting in excel, copy and paste the chart into illustrator
Hi @sarahslo,
Thank you so much for your suggestions!! I put something together using a python library called calmap, and this is what I got:
Trump's first year in office:
Obama's first year in office:
I'm still trying to clean it up in illustrator and, among other things, trying to figuer out a way to add lines that separate months. Though I agree with you. in this case, a heatmap gives a lot more information than simple bar graphs.
That said, I'd love to hear what you think! (Or what anyone thinks, for that matter)
Please complete all of the following sections, or the ghost of Joseph Pulitzer will spookily dance around your issue! A completed version of this template can be found at https://github.com/jsoma/data-studio-projects/issues/1
Pitch
Summary
I want to look into the so-called "Friday news dump" phenomenal, where press secretaries dump all important news on Friday afternoon, hoping that each news will get less coverage.
My data came from the website of Trump White House and Obama White House. I will scrape all the statements and press releases, then see if they do indeed publish more stuff on Fridays.
I also want to do some text analysis of either the title or the content of those releases. Perhaps some words show up more often on Friday than on every other day.
Details
Possible headline(s):
Data set(s): https://www.whitehouse.gov/briefings-statements/, https://obamawhitehouse.archives.gov/briefing-room/statements-and-releases
Code repository: https://github.com/Weihua4455/data_studio/tree/master/code/01-friday-news-dump
Possible problems/fears/questions: I am not sure what's the best way to show my data -- should I group by year/month/week/term? Should I just choose the most significant ones (i.e. max, min, etc) and do an analysis of them?
Work so far
I got my data and played around with it. Without knowing how datetime really work (now I do yay), I ploted data for both administration with bar graphs. X is day of week, Y is percentage of press releases that were published on that day.
Here are my graphs:
Checklist
This checklist must be completed before you submit your draft.
[Project]
in the title