[performance] Repetitive calculation in generate_wait_times

in peartree/parallel, generate_wait_times calculates wait time cost for each stop for a given route.

The original script does the following:

start_time = time.time()
wait_times = {0: [], 1: []}
for stop_id in trips_and_stop_times.stop_id:
    # Handle both inbound and outbound directions
    for direction in [0, 1]:
        # Check if direction_id exists in source data
        if 'direction_id' in trips_and_stop_times:
            constraint_1 = (trips_and_stop_times.direction_id == direction)
            constraint_2 = (trips_and_stop_times.stop_id == stop_id)
            both_constraints = (constraint_1 & constraint_2)
            direction_subset = trips_and_stop_times[both_constraints]
        else:
            direction_subset = trips_and_stop_times.copy()

        # Only run if each direction is contained
        # in the same trip id
        if direction_subset.empty:
            average_wait = np.nan
        else:
            average_wait = pt.parallel.calculate_average_wait(direction_subset)

        # Add according to which direction we are working with
        wait_times[direction].append(average_wait)
elapsed_time = round(time.time() - start_time, 1)
print(f'Segment completed in {elapsed_time}s, {len(trips_and_stop_times.stop_id)} stops')

Performance: Segment completed in 4.7s, 1668 stops

However, trips_and_stop_times.stop_id is not unique as evident by: len(trips_and_stop_times.stop_id.unique()) --> 140

i.e., it is recalculating wait time for the same stop

with an update to loop through unique stop ids: Performance: Segment completed in 0.5s

(gtfs feeds used: f-9qh-riversidetransitagency)

kuanb / peartree

[performance] Repetitive calculation in generate_wait_times #87