kuanb / peartree

peartree: A library for converting transit data into a directed graph for sketch network analysis.
MIT License
201 stars 23 forks source link

[performance] Use unique stop in generate_wait_times #88

Closed yiyange closed 6 years ago

yiyange commented 6 years ago

Fixes https://github.com/kuanb/peartree/issues/87

yiyange commented 6 years ago

ah, the later step relying on all stops having corresponding value. fixing that

kuanb commented 6 years ago

We need to make sure to correctly produce results for these duplicative stop ids so that rows are not lost when the parent method applies the results of this function to the trip and stop times reference DataFrame.

As a result, we need to produce as many results in the 2 directional arrays are there are non-unique stops.

Here's a rough outline of how that would work:

def generate_wait_times(trips_and_stop_times: pd.DataFrame
                        ) -> Dict[int, List[float]]:
    reference_results = {}
    wait_times = {0: [], 1: []}
    for stop_id in trips_and_stop_times.stop_id:

        # First, if we have already computed the result, do not repeat the
        # process and, instead, use the result that was computed last time
        if stop_id in reference_results:
            already_computed_results = reference_results[stop_id]
            for direction in [0, 1]:
                average_wait = already_computed_results[direction]
                wait_times[direction].append(average_wait)

        # Otherwise, we need to identify the average wait time for both
        # directions at that stop id
        else:
            # Create a placeholder value in the lookup reference
            reference_results[stop_id] = {}

            # Abbreviate for brevity
            tast = trips_and_stop_times

            # Handles both inbound and outbound directions
            for direction in [0, 1]:
                # Check if direction_id exists in source data
                if 'direction_id' in tast:
                    constraint_1 = (tast.direction_id == direction)
                    constraint_2 = (tast.stop_id == stop_id)
                    both_constraints = (constraint_1 & constraint_2)
                    direction_subset = tast[both_constraints]
                else:
                    direction_subset = tast.copy()

                # Only run if each direction is contained
                # in the same trip id
                if direction_subset.empty:
                    average_wait = np.nan
                else:
                    average_wait = calculate_average_wait(direction_subset)

                # Before adding to our running final results object, make
                # sure to also populate our tracked references so we do
                # not have to rerun this calculation in a subsequent iteration
                reference_results[stop_id][direction] = average_wait

                # Add according to which direction we are working with
                wait_times[direction].append(average_wait)

    return wait_times
codecov[bot] commented 6 years ago

Codecov Report

Merging #88 into master will increase coverage by 0.01%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #88      +/-   ##
==========================================
+ Coverage   91.62%   91.64%   +0.01%     
==========================================
  Files          12       12              
  Lines         860      862       +2     
==========================================
+ Hits          788      790       +2     
  Misses         72       72
Impacted Files Coverage Δ
peartree/parallel.py 97.03% <100%> (+0.04%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update dc7bc3b...6287b97. Read the comment docs.