propershark / timetable_cpp

Purveyor of schedule information for transit agencies via GTFS feeds and WAMP procedures.
1 stars 0 forks source link

Improve `stop_time` indexing. #16

Closed faultyserver closed 7 years ago

faultyserver commented 7 years ago

As I'm thinking about how #14 could be implemented efficiently, I'm realizing that a single-index solution as is currently implemented will be rather limiting in terms of either functionality or performance.

For example: the current key of (station, departure_time, route, trip) is great for iterating all visits to a station, but less so for iterating only visits for a particular route, where (route, station, departure_time) would be far more suitable. Another key - (route, trip, station, departure_time) - is more efficient for queries like those in #14, where

I think a nice implementation of this would be having some Timetable::index type template that takes a set of parameters to use for building an index. This way, adding and modifying indexes is as simple as adding/modifying a member definition on Timetable::Timetable. Off the top of my head, I'd like something like this:

template<typename key_type, typename value_type>
class index {
  using indexer = std::function<key_type(value_type)>;
  indexer make_index;
  // Store pointers to avoid duplicating objects. This requires that
  // values be stored elsewhere, but the method of storage is irrelevant.
  std::map<key_type, value_type*> index_map;

  public: 
    index(indexer func) : make_index(func) {};

    void add_entry(value_type& value) {
      index_map[make_index(value)] = &value;
    };
};

class Timetable {
  index<std::string, gtfs::stop_time> visits_by_trip([](auto st) {
    return st.trip_id + st.departure_time.to_string();
  });

  std::vector<gtfs::stop_time> stop_times;

  void build_indices() {
    for(auto st : stop_times) {
      visits_by_trip.add_entry(st);
    }
  };
};

Using this implementation would then be as simple as looking up a lower and upper bound in the index and iterating between them as desired. A potential use case could look something like this:

Timetable tt;
auto lower_bound = tt.visits_by_trip.lower_bound(some_trip);
auto upper_bound = tt.visits_by_trip.upper_bound(some_trip);
for(auto it = lower_bound; it != upper_bound; it++) {
  // Since the indices only keep pointers, dereference the value to get
  // the actual visit.
  auto visit = *it;
  // do work...
}

I'm not entirely sure this will work as written above, but I'd like to get some sort of generic indexing system so that we can easily add new indexes as necessary without too much difficulty.

faultyserver commented 7 years ago

This has been (roughly) implemented by #12. I'm not super happy with how it's used, but the existing implementation resolves this issue. Closing as implemented.