Open adrianblanco opened 6 years ago
curious how this is going to play out. one thing you may want to do is to broaden the think about how to handle whenever you have info on buses - it has all the bus line names in it - vs subways, vs. escalators.
you could think about looking at escalators and elevators, not subways. (<just an idea) the type of alert will need to be generalized somehow. but you can still then drill down into those alerts once you've got the data broadly sorted.
Great topic. I'm very curious about this dataset, because a huge problem I've noticed in my commutes is that there is a delay (like, a serious one—train broken down ahead and all local trains rerouted on the express track, for example), but that in the midst of disaster the MTA.info website still says "normal service". I guess what I'm saying is I'm most interested in finding out what is in this dataset that doesn't make it to the mta.info website. I wonder if they store every update that they post there?
I'm also intrigued by the inclusion of "Updates" for ongoing problems. Is that common? Can you use it to track how long these delays/problems can last..?
Please complete all of the following sections, or the ghost of Joseph Pulitzer will spookily dance around your issue! A completed version of this template can be found at https://github.com/jsoma/data-studio-projects/issues/1
Pitch
Delays in the subway are a common issue New Yorkers have to face. But, which are the most affected stations by these issues? Which are the most common and repeated problems in the subway rails? This project pretends to identify the black holes of NYC subway.
Summary
The MTA publishes an Alert Archive with all the Planned Service Changes, Collisions or Escalator break downs that happen in NYC Subway. Scraping this data and analyzing it would be really interesting to know which are the most dysfunctional stations, which suff usually breaks in the subway system, who are the commuters affected by most unexpected issues...
The idea is to scrape the data from MTA website (almost done!), give it a tabular, dataframe structure, analyze and visualize it!
Details
Possible headline(s): Which stuff in the Subway is always breaking? Which are the most common alerts/problems in the MTA service?
Data set(s): https://www.mymtaalerts.com/messagearchive.aspx
Code repository: Still working on it, coming soon! (Sorry for the delay)
Possible problems/fears/questions: One of my main concerns is, after scraping the data, being able to handle and categorize the amounts of text that MTA publishes. Instead of publishing the data in a more tabular way, the MTA chunks of text with the description of the problem, location... So one of the main challenges of this project is to convert this data into a dataframe.
Work so far
Scraping with Selenium almost done. The code has not been posted yet cause I have to solve some issues. Design of the new dataframe I want to build which will include the columns: Line, Location, Type of alert.
Checklist
This checklist must be completed before you submit your draft.