jsoma / data-studio-projects

12 stars 18 forks source link

MTA Alert Archive #243

Open adrianblanco opened 6 years ago

adrianblanco commented 6 years ago

Please complete all of the following sections, or the ghost of Joseph Pulitzer will spookily dance around your issue! A completed version of this template can be found at https://github.com/jsoma/data-studio-projects/issues/1

Pitch

Delays in the subway are a common issue New Yorkers have to face. But, which are the most affected stations by these issues? Which are the most common and repeated problems in the subway rails? This project pretends to identify the black holes of NYC subway.

Summary

screen shot 2018-07-31 at 15 30 07

The MTA publishes an Alert Archive with all the Planned Service Changes, Collisions or Escalator break downs that happen in NYC Subway. Scraping this data and analyzing it would be really interesting to know which are the most dysfunctional stations, which suff usually breaks in the subway system, who are the commuters affected by most unexpected issues...

The idea is to scrape the data from MTA website (almost done!), give it a tabular, dataframe structure, analyze and visualize it!

Details

Possible headline(s): Which stuff in the Subway is always breaking? Which are the most common alerts/problems in the MTA service?

Data set(s): https://www.mymtaalerts.com/messagearchive.aspx

Code repository: Still working on it, coming soon! (Sorry for the delay)

Possible problems/fears/questions: One of my main concerns is, after scraping the data, being able to handle and categorize the amounts of text that MTA publishes. Instead of publishing the data in a more tabular way, the MTA chunks of text with the description of the problem, location... So one of the main challenges of this project is to convert this data into a dataframe.

Work so far

Scraping with Selenium almost done. The code has not been posted yet cause I have to solve some issues. Design of the new dataframe I want to build which will include the columns: Line, Location, Type of alert.

Checklist

This checklist must be completed before you submit your draft.

sarahslo commented 6 years ago

curious how this is going to play out. one thing you may want to do is to broaden the think about how to handle whenever you have info on buses - it has all the bus line names in it - vs subways, vs. escalators.

you could think about looking at escalators and elevators, not subways. (<just an idea) the type of alert will need to be generalized somehow. but you can still then drill down into those alerts once you've got the data broadly sorted.

jessimckenzi commented 6 years ago

Great topic. I'm very curious about this dataset, because a huge problem I've noticed in my commutes is that there is a delay (like, a serious one—train broken down ahead and all local trains rerouted on the express track, for example), but that in the midst of disaster the MTA.info website still says "normal service". I guess what I'm saying is I'm most interested in finding out what is in this dataset that doesn't make it to the mta.info website. I wonder if they store every update that they post there?

I'm also intrigued by the inclusion of "Updates" for ongoing problems. Is that common? Can you use it to track how long these delays/problems can last..?