Feature/nav 91 create raptor cache to prevent building raptor everytime for same date queries

munterfi commented 4 months ago

Cache raptor instances per day using LRU strategy.

@clukas1 considering your description in the Jira ticket:

Store built raptors so that they can be re-used again if a second query for the same date is requested.

Ideally add logic, that raptor only get’s rebuilt if the service calendar type is different. I.e. if Wednesday and Thursday schedule is identical, use the same raptor instance for both.

The second point won't work, since there are special calendar date in the GTFS, e.g. Christmas 1. May, ... So we have to cache raptor instances per date not weekday.

But: I think I got a solution to mask trips inside Raptor. This will also eradicate the need for the stop validations (except departure time), since we have all stops from the GTFS in the raptor, but some stops will just have no active trips.

I can prepare everything around it, but would be nice to build this together into the mega loop of the Raptor, maybe Friday evening or at the weekend?

Unfortunately we still have some strange errors when stops are not present in the Raptor:

Screenshot 2024-06-05 at 17 31 07

clukas1 commented 4 months ago

Cache raptor instances per day using LRU strategy.

@clukas1 considering your description in the Jira ticket:

Store built raptors so that they can be re-used again if a second query for the same date is requested.

Ideally add logic, that raptor only get’s rebuilt if the service calendar type is different. I.e. if Wednesday and Thursday schedule is identical, use the same raptor instance for both.

The second point won't work, since there are special calendar date in the GTFS, e.g. Christmas 1. May, ... So we have to cache raptor instances per date not weekday.

But: I think I got a solution to mask trips inside Raptor. This will also eradicate the need for the stop validations (except departure time), since we have all stops from the GTFS in the raptor, but some stops will just have no active trips.

I can prepare everything around it, but would be nice to build this together into the mega loop of the Raptor, maybe Friday evening or at the weekend?

Unfortunately we still have some strange errors when stops are not present in the Raptor:

Regarding point two in the Jira Issue, I'm aware not every Thursday etc. is the same. But I think if we combine all active service ids, we can generate a meaningful cache key. Because I expect most of the regular weekdays etc will have the same service ids. Might require adding a function to create this key.

Regarding mask, sure we can huddle up and integrate it into the raptor algorithm. I'm busy on Friday, but should have some time on Saturday.

munterfi commented 4 months ago

Regarding point two in the Jira Issue, I'm aware not every Thursday etc. is the same. But I think if we combine all active service ids, we can generate a meaningful cache key. Because I expect most of the regular weekdays etc will have the same service ids. Might require adding a function to create this key.

Yes you are right, in theorie this should work, and it is now implemented in abacfbe. On the integration test data sample this works well, but on the GTFS of Switzerland, almost every day has its own active trip set. So even if we query normal weekdays, most of the time a new raptor will be created.

Here also an example how the equals on Sets works in Java. I was not sure, if the set instance itself matters or not: It does not, just its content is considered in equals:

package ch.naviqore.example;

import lombok.EqualsAndHashCode;
import lombok.Getter;
import lombok.RequiredArgsConstructor;

import java.util.HashSet;
import java.util.Set;

public class Test {

    @RequiredArgsConstructor
    @Getter
    @EqualsAndHashCode
    static class Instance {
        private final String id;
    }

    public static void main(String[] args) {
        Instance instanceA = new Instance("a");
        Instance instanceA2 = new Instance("a");
        Instance instanceB = new Instance("b");
        Instance instanceC = new Instance("c");

        Set<Instance> set1 = new HashSet<>();
        Set<Instance> set2 = new HashSet<>();

        set1.add(instanceA);
        set1.add(instanceA2);
        set1.add(instanceB);
        set1.add(instanceC);

        set2.add(instanceA);
        set2.add(instanceB);
        set2.add(instanceC);

        System.out.println(set1.equals(set2));
        // true

        set2.add(instanceA);
        System.out.println(set1.equals(set2));
        // true

        set1.remove(instanceA);
        System.out.println(set1.equals(set2));
        // false
    }

}

clukas1 commented 4 months ago

Review already done :) See my comments before you requested the review. Only thing I would consider to reduce the set size, is using sets of calendars instead of trips to identify similar days.

clukas1 commented 4 months ago

Review already done :) See my comments before you requested the review. Only thing I would consider to reduce the set size, is using sets of calendars instead of trips to identify similar days.

But sadly, you're right for a large dataset (like switzerland) every day will be a special case.

naviqore / public-transit-service

Feature/nav 91 create raptor cache to prevent building raptor everytime for same date queries #52