Open hakantan opened 6 years ago
Sounds like it will be a great project. The one false leg I see in the whole operation is that your reported numbers from the UN will almost certainly just be official, legal migration. Many people concerned with this issue think that a large chunk of the immigrants are coming through these illegal channels. I don't know how/if you can correct for that, but it'd be worth trying.
Also, Columbia J-school has an entire post-grad fellowship program devoted to migration issues. They seem super approachable. Maybe they can point you towards some data sets or resources that can overcome the undercounted illegal issues
the unhcr data tell you so much more than just count. can you show us where people are coming from over time? germany's migrant population has changed as crisis' flare up in different places. with the time series, can you point to moments and let us know if laws have changed? why the peak? why the slowdown.
Pitch
Europe has been having a discussion about migration for years now. The New York Times recently asked 'What migration crisis?" since the numbers went down considerably. But looking at headlines you might not necessarily know that. If this is true, then we should talk about this.
Summary
I want to compare the actual numbers of migrants coming to Germany to articles written about them. My hunch is, that this debate is detached from reality, i.e. tons of articles written versus not so many people coming anymore.
Details
Official numbers can be found at the United Nations. As additional dataset, I am planning on scraping google search results, getting them on a week-by-week basis. Scraper is all set up and I did a test-run (grabbed one month in 2014.).
Possible headline(s): Fewer and fewer migrants are coming to Europe – the news doesn't seem to care (don't know if this is true)
Data set(s): Here
Code repository: Here you go.
Possible problems/fears/questions: Scraping could prove harder than I expected. I'm also thinking about scraping the actual texts. If I really do that, then I would have to limit myself to some news outlets, otherwise it'd just get too much.
Work so far
Scraper set up.
Checklist