open-austin / project-ideas

:bulb: A place to collect ideas for Open Austin projects
183 stars 25 forks source link

CapMetrics #3

Closed luqmaan closed 5 years ago

luqmaan commented 9 years ago

One sentence description:

In the center of town, the 1 feels faster than the 801. Is it?

Link (more details/brain dump/alpha):

All the data to answer this question lives at https://github.com/scascketta/CapMetrics. Somebody just needs to query it.

Some graphs would be cool.

Project Needs (dev/design/resources):

Statistics/data help needed. Anybody with R experience could really help.

Status (in progress, pie-in-the-sky)

Somebody's just gotta do it In progress

johntyree commented 9 years ago

Started here: https://github.com/open-austin/capture

Can compute trip duration and output to csv.

luqmaan commented 9 years ago

Some inspiration http://mbtaviz.github.io/

luqmaan commented 9 years ago

From the triangle to downtown, the 801 is 4.7 minutes slower than the 1. From end to end, the 801 is 25.4 minutes faster than the 1.

python distance.py \
    --route_id 1 \
    --begin_lat 30.162883 \
    --begin_lon -97.790317 \
    --end_lat 30.266218 \
    --end_lon -97.746056 \
    "../CapMetrics/data/vehicle_positions/*.csv" > data/1_triangle_republic.csv
python distance.py \
    --route_id 801 \
    --begin_lat 30.162883 \
    --begin_lon -97.790317 \
    --end_lat 30.266218 \
    --end_lon -97.746056 \
    "../CapMetrics/data/vehicle_positions/*.csv" > data/801_triangle_republic.csv
import pandas as pd

df_801_triangle = pd.read_csv('data/801_triangle_republic.csv').drop(['vehicle_id', 'route_id', 'trip_id', 'timestamp_begin','timestamp_end'], axis=1)
df_1_triangle = pd.read_csv('data/1_triangle_republic.csv').drop(['vehicle_id', 'route_id', 'trip_id', 'timestamp_begin','timestamp_end'], axis=1)
df_801_techridge = pd.read_csv('data/801_techridge_southpark.csv').drop(['vehicle_id', 'route_id', 'trip_id', 'timestamp_begin','timestamp_end'], axis=1)
df_1_techridge = pd.read_csv('data/1_techridge_bluff.csv').drop(['vehicle_id', 'route_id', 'trip_id', 'timestamp_begin','timestamp_end'], axis=1)

duration_801_triangle = (df_801_triangle.abs().mean() / 60).iloc[0]
duration_1_triangle = (df_1_triangle.abs().mean() / 60).iloc[0]
duration_801_techridge = (df_801_techridge.abs().mean() / 60).iloc[0]
duration_1_techridge = (df_1_techridge.abs().mean() / 60).iloc[0]

print('801 triangle<->republic square takes {0:.1f} minutes'.format(duration_801_triangle))
print('1 triangle<->republic square takes {0:.1f} minutes'.format(duration_1_triangle))
print('801 techridge<->southpark takes {0:.1f} minutes'.format(duration_801_techridge))
print('1 techridge<->bluff springs/william cannon takes {0:.1f} minutes'.format(duration_1_techridge))

triangle_diff =  duration_801_triangle - duration_1_triangle
techridge_diff = duration_1_techridge - duration_801_techridge

print('From the triangle to downtown, the 801 is {0:.1f} minutes slower than the 1'.format(triangle_diff))
print('From end to end, the 801 is {0:.1f} minutes faster than the 1'.format(techridge_diff))

# 801 triangle<->republic square takes 38.2 minutes
# 1 triangle<->republic square takes 33.5 minutes
# 801 techridge<->southpark takes 94.9 minutes
# 1 techridge<->bluff springs/william cannon takes 120.4 minutes
# From the triangle to downtown, the 801 is 4.7 minutes slower than the 1
# From end to end, the 801 is 25.4 minutes faster than the 1
johntyree commented 9 years ago

:+1: I'd be worried about using the mean though. It will penalize harshly for random delays and so on. Would also be nice to slice this up by time of day somewhat. Is it always faster? Maybe just outside of rush hour traffic?

luqmaan commented 9 years ago

Travel time by hour: https://docs.google.com/spreadsheets/d/1TzBumxSiu1_GIOpiTPSmR8I4r7-2COqzeZAu7HPWwb0/pubchart?oid=934384677&format=interactive

image

The gap in 1 data between 5am and 10am is weird. I wonder where its coming from?

import pandas as pd

df_801_triangle = pd.read_csv('data/801_triangle_republic.csv', parse_dates=['timestamp_end', 'timestamp_begin']).drop(['vehicle_id', 'route_id', 'trip_id'], axis=1)
df_801_triangle['duration'] = df_801_triangle['duration'].abs()
df_801_triangle.groupby(by=lambda x: df_801_triangle.loc[x]['timestamp_begin'].hour).mean()

'''
       duration
0   2404.986318
1   2337.109971
2   2166.027231
3   2043.290172
4   1615.084788
5   1612.875000
6   1948.333333
7   2307.000000
8    539.000000
10  1784.946372
11  1806.229058
12  2263.699196
13  2362.899512
14  2326.956132
15  2225.262527
16  2267.770325
17  2277.290946
18  2303.534690
19  2303.471163
20  2530.505758
21  2714.864654
22  2639.210850
23  2445.719959
'''

df_1_triangle = pd.read_csv('data/1_triangle_republic.csv', parse_dates=['timestamp_end', 'timestamp_begin']).drop(['vehicle_id', 'route_id', 'trip_id'], axis=1)
df_1_triangle['duration'] = df_1_triangle['duration'].abs()
df_1_triangle.groupby(by=lambda x: df_1_triangle.loc[x]['timestamp_begin'].hour).mean()

'''
       duration
0   2061.907173
1   1932.376471
2   1855.489865
3   1843.061321
4   1149.467391
5    941.857143
10  1683.092466
11  1876.712695
12  1938.073394
13  1982.581162
14  1994.681208
15  1982.693517
16  2087.346000
17  2026.223986
18  2048.953488
19  2145.514735
20  2258.500000
21  2282.063032
22  2345.593692
23  2169.519588
'''
luqmaan commented 9 years ago

Looks like the the 803 isn't worse than the 3.

https://docs.google.com/spreadsheets/d/1TzBumxSiu1_GIOpiTPSmR8I4r7-2COqzeZAu7HPWwb0/pubchart?oid=1714680943&format=interactive

image

I'm guessing the times are utc, which is why we see the gap in the "morning."

python distance.py \
    --route_id 3 \
    --begin_lat 30.324999 \
    --begin_lon -97.739681 \
    --end_lat 30.266218 \
    --end_lon -97.746056 \
    "../CapMetrics/data/vehicle_positions/*.csv" > data/3_northloop_republic.csv

python distance.py \
    --route_id 803 \
    --begin_lat 30.324999 \
    --begin_lon -97.739681 \
    --end_lat 30.266218 \
    --end_lon -97.746056 \
    "../CapMetrics/data/vehicle_positions/*.csv" > data/803_northloop_republic.csv
In [8]: %paste
df = pd.read_csv('data/3_northloop_republic.csv', parse_dates=['timestamp_end', 'timestamp_begin']).drop(['vehicle_id', 'route_id', 'trip_id'], axis=1)
df['duration'] = df['duration'].abs()
df.groupby(by=lambda x: df.loc[x]['timestamp_begin'].hour).mean()
## -- End pasted text --
Out[8]:
       duration
0   1558.017897
1   1514.292017
2   1455.608025
3   1434.820253
4    963.218362
5    888.400000
10  1386.253788
11  1406.769231
12  1534.946429
13  1562.705882
14  1569.446429
15  1591.308793
16  1584.707724
17  1646.103659
18  1674.998012
19  1680.359438
20  1599.042254
21  1584.675926
22  1862.955850
23  1742.962594

In [9]: %paste
df = pd.read_csv('data/803_northloop_republic.csv', parse_dates=['timestamp_end', 'timestamp_begin']).drop(['vehicle_id', 'route_id', 'trip_id'], axis=1)
df['duration'] = df['duration'].abs()
df.groupby(by=lambda x: df.loc[x]['timestamp_begin'].hour).mean()
## -- End pasted text --
Out[9]:
       duration
0   1474.702055
1   1427.106154
2   1400.170455
3   1378.574803
4    984.291667
5    983.466667
6    947.400000
7   1591.571429
10  1316.428191
11  1359.617111
12  1439.466750
13  1543.321429
14  1519.070994
15  1487.160188
16  1546.182912
17  1574.540243
18  1648.564604
19  1650.026795
20  1699.299492
21  1826.281609
22  1942.832996
23  1692.629823
McCnnll commented 9 years ago

Yeah, that's UTC. All you have to do is subtract five from the time. It looks like the 803 is slower after 3pm in the evening rush hour. At any rate, MetroRapid is not fast enough to justify the extra fare and inconvenience. (No stops between 26th and 38th!)

johntyree commented 9 years ago

That's really interesting to see. MetroRapid is supposed to have quite a few advantages no? For example the traffic light timings are supposed to be influenced in its favor. I'd expect rush hour to be when it does the best.

McCnnll commented 9 years ago

I am not sure that MetroRapid can hold greens longer in this area. I believe that it cannot affect timing downtown, and it may just be in areas outside the center city that it can hold greens. However, there are the dedicated bus lanes downtown, but, obviously, the local routes use those also.

sophieher commented 9 years ago

Yes, I ride these buses in the evening rush hour, and none seem to have the Green Light Technology that they advertised when MetroRapids were first introduced. It also seems that they are marketing as 'High Frequency' routes, rather than express style routes. Could be an argument for the higher prices.

luqmaan commented 9 years ago

Whats the next step?

I think we need to come up with a way to get the travel time for all central austin trips. For example:

Then how do we present these travel times in a way that makes sense?

also hiiiiii @sophieher

McCnnll commented 9 years ago

Here is a FAQ on MetroRapid that has some helpful information about the signal-priority tech: https://www.capmetro.org/faq.aspx?id=32 It does not work between Cesar Chavez and MLK. Apparently it's only used when buses are "behind schedule."

As far as next steps, I would first do the northbound trips for these routes. Also, adjusting the times for UTC would be useful for presenting this information to the public. It would also be helpful to output the values as csv, xls, to the public.

johntyree commented 9 years ago

@luqmaan What are we really trying to present? Just travel times or comparisons between similar ways to get between particular parts of town? If it's the former then I'd want to see a histogram or density plot or something, with bucketed time on X and number of observed trips on Y.

The latter requires more thought. Maybe violin plots? One image would be for one "trip type," such as downtown->triangle, and the each violin would be some way of making the trip, such as 801 vs 1.

luqmaan commented 9 years ago

I think the question I'm trying to answer is "Where/when is it worse to ride the MetroRapid?"

McCnnll commented 9 years ago

A more policy-oriented question, though, would be "Are the advantages of MetroRapid what CapMetro claims they are?" When the MetroRapid lines began service, CapMetro significantly cut the service frequency of the 1 and the 3 and eliminated the 101 express bus. What this meant for 1 and 3 riders is that they had to pay an extra 50¢ for most of the bus trips on their line and had fewer stops. (Or, of course, they could wait for the 1 or 3 bus that runs much more infrequently than it did before.) Many people, including me, saw this as a degradation of service on CapMetro's busiest lines. If it were the case that the service frequency on the 1 and 3 remained roughly the same and CapMetro just added MetroRapid as a premium service, I think observers would have griped about the cost and execution, but, by cutting frequency of those routes, the agency is pushing people to ride MetroRapid whether it makes sense for them or not.

If it's the case that MetroRapid has insignificant time advantages for center-city residents, then that would suggest that existing riders on the 1 and 3 routes got screwed by CapMetro, particularly since the MetroRapid routes don't stop at some significant destinations such as Wheatsville Co-op or realistically close to others such as St. Edward's University.

luqmaan commented 9 years ago

Now with proper timezone

image

https://docs.google.com/spreadsheets/d/1TzBumxSiu1_GIOpiTPSmR8I4r7-2COqzeZAu7HPWwb0/pubchart?oid=735586080&format=interactive

https://docs.google.com/spreadsheets/d/1TzBumxSiu1_GIOpiTPSmR8I4r7-2COqzeZAu7HPWwb0/pubchart?oid=1798681982&format=interactive

luqmaan commented 9 years ago

I think we can figure out which segments/trips to prioritize by looking at the instabus analytics.

johntyree commented 9 years ago

Prioritize in what sense?

luqmaan commented 9 years ago

Instead of showing the travel time for every possible trip, show the travel times for the common trips.

McCnnll commented 9 years ago

Is there a way to refine the buckets on the map? The way that it renders for me, basically all of central Austin and the areas around the MetroRapid lines are in red.

On Fri, Jul 10, 2015 at 3:17 PM, Luqmaan Dawoodjee <notifications@github.com

wrote:

Instead of showing the travel time for every possible trip, show the travel times for the common trips. Here's a heatmap of where instabus has been used https://lol.cartodb.com/viz/06a8b716-2653-11e5-a950-0e5e07bb5d8a/map

— Reply to this email directly or view it on GitHub https://github.com/open-austin/project-ideas/issues/3#issuecomment-120511528 .

Chris McConnell ☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠☠

rlcauvin commented 9 years ago

It might be worthwhile to compare the real-time discrepancies between 801 and 1 to the discrepancies already baked into the schedules.

For example, see what the scheduled travel time is between Republic Square and Museum Station for the 801 and 1. Or perhaps between Republic Square and the Triangle.

werdnanoslen commented 7 years ago

Hi everyone, this looks pretty technical, but it seems a lot was going on here. What's the status onthis project now?

mateoclarke commented 7 years ago

Not sure if there has been any work recently. But this is how the analysis lives on to the public and it is listed on our projects page. https://seancascketta.com/CapMetrics/

"Launched" "status: experiment"

werdnanoslen commented 7 years ago

Ah, it can be hard to tell at first glance which of these ideas made it to the projects page with the website project title and idea issue's working title sometimes being different. This is super rad btw, I like that a practical and simply stated question can be solved with open data and presented in an easy to read way. It'd be super cool if we could get a list of research questions like this, paired with potentially useful data sources for folks interested in data science to solve. Could even be the subject of something like an open austin data jam.

mscarey commented 5 years ago

I don't think we want to encourage any more updates to this study, but it's remaining on the projects page.