Closed luqmaan closed 5 years ago
Started here: https://github.com/open-austin/capture
Can compute trip duration and output to csv.
Some inspiration http://mbtaviz.github.io/
From the triangle to downtown, the 801 is 4.7 minutes slower than the 1. From end to end, the 801 is 25.4 minutes faster than the 1.
python distance.py \
--route_id 1 \
--begin_lat 30.162883 \
--begin_lon -97.790317 \
--end_lat 30.266218 \
--end_lon -97.746056 \
"../CapMetrics/data/vehicle_positions/*.csv" > data/1_triangle_republic.csv
python distance.py \
--route_id 801 \
--begin_lat 30.162883 \
--begin_lon -97.790317 \
--end_lat 30.266218 \
--end_lon -97.746056 \
"../CapMetrics/data/vehicle_positions/*.csv" > data/801_triangle_republic.csv
import pandas as pd
df_801_triangle = pd.read_csv('data/801_triangle_republic.csv').drop(['vehicle_id', 'route_id', 'trip_id', 'timestamp_begin','timestamp_end'], axis=1)
df_1_triangle = pd.read_csv('data/1_triangle_republic.csv').drop(['vehicle_id', 'route_id', 'trip_id', 'timestamp_begin','timestamp_end'], axis=1)
df_801_techridge = pd.read_csv('data/801_techridge_southpark.csv').drop(['vehicle_id', 'route_id', 'trip_id', 'timestamp_begin','timestamp_end'], axis=1)
df_1_techridge = pd.read_csv('data/1_techridge_bluff.csv').drop(['vehicle_id', 'route_id', 'trip_id', 'timestamp_begin','timestamp_end'], axis=1)
duration_801_triangle = (df_801_triangle.abs().mean() / 60).iloc[0]
duration_1_triangle = (df_1_triangle.abs().mean() / 60).iloc[0]
duration_801_techridge = (df_801_techridge.abs().mean() / 60).iloc[0]
duration_1_techridge = (df_1_techridge.abs().mean() / 60).iloc[0]
print('801 triangle<->republic square takes {0:.1f} minutes'.format(duration_801_triangle))
print('1 triangle<->republic square takes {0:.1f} minutes'.format(duration_1_triangle))
print('801 techridge<->southpark takes {0:.1f} minutes'.format(duration_801_techridge))
print('1 techridge<->bluff springs/william cannon takes {0:.1f} minutes'.format(duration_1_techridge))
triangle_diff = duration_801_triangle - duration_1_triangle
techridge_diff = duration_1_techridge - duration_801_techridge
print('From the triangle to downtown, the 801 is {0:.1f} minutes slower than the 1'.format(triangle_diff))
print('From end to end, the 801 is {0:.1f} minutes faster than the 1'.format(techridge_diff))
# 801 triangle<->republic square takes 38.2 minutes
# 1 triangle<->republic square takes 33.5 minutes
# 801 techridge<->southpark takes 94.9 minutes
# 1 techridge<->bluff springs/william cannon takes 120.4 minutes
# From the triangle to downtown, the 801 is 4.7 minutes slower than the 1
# From end to end, the 801 is 25.4 minutes faster than the 1
:+1: I'd be worried about using the mean though. It will penalize harshly for random delays and so on. Would also be nice to slice this up by time of day somewhat. Is it always faster? Maybe just outside of rush hour traffic?
Travel time by hour: https://docs.google.com/spreadsheets/d/1TzBumxSiu1_GIOpiTPSmR8I4r7-2COqzeZAu7HPWwb0/pubchart?oid=934384677&format=interactive
The gap in 1 data between 5am and 10am is weird. I wonder where its coming from?
import pandas as pd
df_801_triangle = pd.read_csv('data/801_triangle_republic.csv', parse_dates=['timestamp_end', 'timestamp_begin']).drop(['vehicle_id', 'route_id', 'trip_id'], axis=1)
df_801_triangle['duration'] = df_801_triangle['duration'].abs()
df_801_triangle.groupby(by=lambda x: df_801_triangle.loc[x]['timestamp_begin'].hour).mean()
'''
duration
0 2404.986318
1 2337.109971
2 2166.027231
3 2043.290172
4 1615.084788
5 1612.875000
6 1948.333333
7 2307.000000
8 539.000000
10 1784.946372
11 1806.229058
12 2263.699196
13 2362.899512
14 2326.956132
15 2225.262527
16 2267.770325
17 2277.290946
18 2303.534690
19 2303.471163
20 2530.505758
21 2714.864654
22 2639.210850
23 2445.719959
'''
df_1_triangle = pd.read_csv('data/1_triangle_republic.csv', parse_dates=['timestamp_end', 'timestamp_begin']).drop(['vehicle_id', 'route_id', 'trip_id'], axis=1)
df_1_triangle['duration'] = df_1_triangle['duration'].abs()
df_1_triangle.groupby(by=lambda x: df_1_triangle.loc[x]['timestamp_begin'].hour).mean()
'''
duration
0 2061.907173
1 1932.376471
2 1855.489865
3 1843.061321
4 1149.467391
5 941.857143
10 1683.092466
11 1876.712695
12 1938.073394
13 1982.581162
14 1994.681208
15 1982.693517
16 2087.346000
17 2026.223986
18 2048.953488
19 2145.514735
20 2258.500000
21 2282.063032
22 2345.593692
23 2169.519588
'''
Looks like the the 803 isn't worse than the 3.
I'm guessing the times are utc, which is why we see the gap in the "morning."
python distance.py \
--route_id 3 \
--begin_lat 30.324999 \
--begin_lon -97.739681 \
--end_lat 30.266218 \
--end_lon -97.746056 \
"../CapMetrics/data/vehicle_positions/*.csv" > data/3_northloop_republic.csv
python distance.py \
--route_id 803 \
--begin_lat 30.324999 \
--begin_lon -97.739681 \
--end_lat 30.266218 \
--end_lon -97.746056 \
"../CapMetrics/data/vehicle_positions/*.csv" > data/803_northloop_republic.csv
In [8]: %paste
df = pd.read_csv('data/3_northloop_republic.csv', parse_dates=['timestamp_end', 'timestamp_begin']).drop(['vehicle_id', 'route_id', 'trip_id'], axis=1)
df['duration'] = df['duration'].abs()
df.groupby(by=lambda x: df.loc[x]['timestamp_begin'].hour).mean()
## -- End pasted text --
Out[8]:
duration
0 1558.017897
1 1514.292017
2 1455.608025
3 1434.820253
4 963.218362
5 888.400000
10 1386.253788
11 1406.769231
12 1534.946429
13 1562.705882
14 1569.446429
15 1591.308793
16 1584.707724
17 1646.103659
18 1674.998012
19 1680.359438
20 1599.042254
21 1584.675926
22 1862.955850
23 1742.962594
In [9]: %paste
df = pd.read_csv('data/803_northloop_republic.csv', parse_dates=['timestamp_end', 'timestamp_begin']).drop(['vehicle_id', 'route_id', 'trip_id'], axis=1)
df['duration'] = df['duration'].abs()
df.groupby(by=lambda x: df.loc[x]['timestamp_begin'].hour).mean()
## -- End pasted text --
Out[9]:
duration
0 1474.702055
1 1427.106154
2 1400.170455
3 1378.574803
4 984.291667
5 983.466667
6 947.400000
7 1591.571429
10 1316.428191
11 1359.617111
12 1439.466750
13 1543.321429
14 1519.070994
15 1487.160188
16 1546.182912
17 1574.540243
18 1648.564604
19 1650.026795
20 1699.299492
21 1826.281609
22 1942.832996
23 1692.629823
Yeah, that's UTC. All you have to do is subtract five from the time. It looks like the 803 is slower after 3pm in the evening rush hour. At any rate, MetroRapid is not fast enough to justify the extra fare and inconvenience. (No stops between 26th and 38th!)
That's really interesting to see. MetroRapid is supposed to have quite a few advantages no? For example the traffic light timings are supposed to be influenced in its favor. I'd expect rush hour to be when it does the best.
I am not sure that MetroRapid can hold greens longer in this area. I believe that it cannot affect timing downtown, and it may just be in areas outside the center city that it can hold greens. However, there are the dedicated bus lanes downtown, but, obviously, the local routes use those also.
Yes, I ride these buses in the evening rush hour, and none seem to have the Green Light Technology that they advertised when MetroRapids were first introduced. It also seems that they are marketing as 'High Frequency' routes, rather than express style routes. Could be an argument for the higher prices.
Whats the next step?
I think we need to come up with a way to get the travel time for all central austin trips. For example:
Then how do we present these travel times in a way that makes sense?
also hiiiiii @sophieher
Here is a FAQ on MetroRapid that has some helpful information about the signal-priority tech: https://www.capmetro.org/faq.aspx?id=32 It does not work between Cesar Chavez and MLK. Apparently it's only used when buses are "behind schedule."
As far as next steps, I would first do the northbound trips for these routes. Also, adjusting the times for UTC would be useful for presenting this information to the public. It would also be helpful to output the values as csv, xls, to the public.
@luqmaan What are we really trying to present? Just travel times or comparisons between similar ways to get between particular parts of town? If it's the former then I'd want to see a histogram or density plot or something, with bucketed time on X and number of observed trips on Y.
The latter requires more thought. Maybe violin plots? One image would be for one "trip type," such as downtown->triangle, and the each violin would be some way of making the trip, such as 801 vs 1.
I think the question I'm trying to answer is "Where/when is it worse to ride the MetroRapid?"
A more policy-oriented question, though, would be "Are the advantages of MetroRapid what CapMetro claims they are?" When the MetroRapid lines began service, CapMetro significantly cut the service frequency of the 1 and the 3 and eliminated the 101 express bus. What this meant for 1 and 3 riders is that they had to pay an extra 50¢ for most of the bus trips on their line and had fewer stops. (Or, of course, they could wait for the 1 or 3 bus that runs much more infrequently than it did before.) Many people, including me, saw this as a degradation of service on CapMetro's busiest lines. If it were the case that the service frequency on the 1 and 3 remained roughly the same and CapMetro just added MetroRapid as a premium service, I think observers would have griped about the cost and execution, but, by cutting frequency of those routes, the agency is pushing people to ride MetroRapid whether it makes sense for them or not.
If it's the case that MetroRapid has insignificant time advantages for center-city residents, then that would suggest that existing riders on the 1 and 3 routes got screwed by CapMetro, particularly since the MetroRapid routes don't stop at some significant destinations such as Wheatsville Co-op or realistically close to others such as St. Edward's University.
I think we can figure out which segments/trips to prioritize by looking at the instabus analytics.
Prioritize in what sense?
Instead of showing the travel time for every possible trip, show the travel times for the common trips.
Is there a way to refine the buckets on the map? The way that it renders for me, basically all of central Austin and the areas around the MetroRapid lines are in red.
On Fri, Jul 10, 2015 at 3:17 PM, Luqmaan Dawoodjee <notifications@github.com
wrote:
Instead of showing the travel time for every possible trip, show the travel times for the common trips. Here's a heatmap of where instabus has been used https://lol.cartodb.com/viz/06a8b716-2653-11e5-a950-0e5e07bb5d8a/map
— Reply to this email directly or view it on GitHub https://github.com/open-austin/project-ideas/issues/3#issuecomment-120511528 .
Chris McConnell ☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠
It might be worthwhile to compare the real-time discrepancies between 801 and 1 to the discrepancies already baked into the schedules.
For example, see what the scheduled travel time is between Republic Square and Museum Station for the 801 and 1. Or perhaps between Republic Square and the Triangle.
Hi everyone, this looks pretty technical, but it seems a lot was going on here. What's the status onthis project now?
Not sure if there has been any work recently. But this is how the analysis lives on to the public and it is listed on our projects page. https://seancascketta.com/CapMetrics/
"Launched" "status: experiment"
Ah, it can be hard to tell at first glance which of these ideas made it to the projects page with the website project title and idea issue's working title sometimes being different. This is super rad btw, I like that a practical and simply stated question can be solved with open data and presented in an easy to read way. It'd be super cool if we could get a list of research questions like this, paired with potentially useful data sources for folks interested in data science to solve. Could even be the subject of something like an open austin data jam.
I don't think we want to encourage any more updates to this study, but it's remaining on the projects page.
One sentence description:
In the center of town, the 1 feels faster than the 801. Is it?
Link (more details/brain dump/alpha):
All the data to answer this question lives at https://github.com/scascketta/CapMetrics. Somebody just needs to query it.
Some graphs would be cool.
Project Needs (dev/design/resources):
Statistics/data help needed. Anybody with R experience could really help.Status (in progress, pie-in-the-sky)
Somebody's just gotta do itIn progress