naviqore / public-transit-service

Public transit schedule information and connection routing service based on GTFS data and the RAPTOR algorithm.
MIT License
4 stars 1 forks source link

Nav 47 refactor raptor algorithm #68

Closed munterfi closed 3 months ago

munterfi commented 3 months ago

Hi @clukas1 and @Brunner246,

A PR for the refactoring of the Raptor algorithm we started together.

Summary:

This PR goes across the project, so avoid starting new branches at the moment. Let's finish the refactoring first. I think it would make sense to discuss the changes in a short meeting.

munterfi commented 3 months ago

Eventually we could move the marked stops also into the Objective class?

munterfi commented 3 months ago

Very nice work, now I can follow my own code better :)

@clukas1, thanks for the review and feedback!

Although, I don't see any deal breaking issues with this PR, which prevent approval. Some comments to discuss.

  • Connection - Comparable. I would not implement comparable for connections, I believe comparing them by arrival time does not make any sense, or at least comparisons by departure time, travel time or number of transfers are just as likely.

My rationale behind using Comparable is to enforce a consistent order, rather than relying on potentially non-deterministic or algorithm-specific internals. However, I agree that comparing by arrival time might not be the most logical choice. We could consider alternatives like comparing by departure time, travel time, or number of transfers. But I think enforcing comparable in the interface is good, to ensure those toughts are made.

  • Raptor / Objective: I like having this objective container for handling query specific attributes. And I agree that the markedStops should be managed by the spawnFromStop method. However, maybe we should rename the Raptor class to Router or RaptorRouter (which implements the RaptorAlgorithm) and the Objective to Request (because in my opinion it's not only a objective container). The idea behind is, that we can accept requests via the interface through the raptor algorithm and keep all constant raptor relevant data in this class, however move all of the request processing to the request class instance (which will only keep data relevant to the request), this could reduce the argument passing around even more. As a result, spawnFromStops would also be moved to Request.

Agree to renaming Raptor to RaptorRouter, see 91b0772.

However I vote against renaming Objective to Request since the Objective is something internal, which is not even used outside of the spawnFromStops method. In naming it, I have borrowed this term from the field of algorithmics. Because the objective (function) should be minimized. In our case, the arrival time and the number of transfers. I think this fits the terminology quite well, but perhaps there is a better term? From my point of view, a Request is something that is sent to the RaptorRouter from outside and answered with a Response object.

I may not fully understand the suggestion to move the spawnFromStops method. But I like the separation of the main loop in the method spawnFromStops and the objective which is optimized. So the main logic of the process is still in the RaptorRaptor router. This allows us to separate the control flow cleanly from the complex details, which are then handled by the RouteScanner, FootpathRelaxer and the Objective. To implement the range query, we can just reuse the same Objective and spawn from the stops again at the next later departures at the given stops.

  • Impl: I now know why I like having interface names starting with i (or some other convention). Can we rename the raptor Interfaces to RaptorLeg / RaptorConnection and the implementations to Leg/Connection?

Can you further elaborate on this? LegImpl should never be visible outside of the Raptor package. So only ch.naviqore.raptor.Leg should be visible from the outside, which is quite clear that it belongs to the Raptor. The user does not need to know if it is an interface or a concrete class. If I use a class with Impl postfix outside the package, then it triggers a weird feeling. That's why I like it somehow 😄

But you are also right about the Impl postfix (removed in 5dccb64), see this artice: https://www.baeldung.com/java-interface-naming-conventions

Another widespread pattern in enterprise Java applications comes from the Hungarian Notation. In this section, we’ll briefly talk about the preceding I and the Impl suffix patterns and why we should avoid them.

The preceding I pattern suggests that any interface name starts with the capital letter I, an abbreviation for Interface. This is more common in C# code, where we don’t have a keyword to differentiate interface implementation from inheritance. However, this wouldn’t be necessary in Java since we can differentiate implementation from inheritance by looking at the keywords implements and extends.

The Impl suffix pattern suggests naming the interface’s implementations instead of the interface itself. Hence, all implementation names end with Impl. This usually appears when we create an interface with a single implementation and we can’t find a name that truly represents its specialization. However, the word Impl doesn’t add anything since the class signature shows it’s implementing something.

Therefore, we must avoid naming interfaces and classes using the I or Impl patterns, for example, UserImpl, IUser, IIdentifiable, and IdentifiableImpl.

We should discuss these points in more detail at our meeting on Wednesday.

clukas1 commented 3 months ago

My rationale behind using Comparable is to enforce a consistent order, rather than relying on potentially non-deterministic or algorithm-specific internals. However, I agree that comparing by arrival time might not be the most logical choice. We could consider alternatives like comparing by departure time, travel time, or number of transfers. But I think enforcing comparable in the interface is good, to ensure those toughts are made.

I would still prefer not to implement it, the problem is that by implementing it, in my opinion it has to be 100% clear why and how it is intended or at least easy to explain. In this case here, I think there are many logical ways to implement it and none over weighs the other, hence I think it's cleaner not to implement it to make it explicit that one has to define how one wants to sort / compare the connections.

Agree to renaming Raptor to RaptorRouter, see 91b0772.

Thanks

However I vote against renaming Objective to Request since the Objective is something internal, which is not even used outside of the spawnFromStops method. In naming it, I have borrowed this term from the field of algorithmics. Because the objective (function) should be minimized. In our case, the arrival time and the number of transfers. I think this fits the terminology quite well, but perhaps there is a better term? From my point of view, a Request is something that is sent to the RaptorRouter from outside and answered with a Response object.

I may not fully understand the suggestion to move the spawnFromStops method. But I like the separation of the main loop in the method spawnFromStops and the objective which is optimized. So the main logic of the process is still in the RaptorRaptor router. This allows us to separate the control flow cleanly from the complex details, which are then handled by the RouteScanner, FootpathRelaxer and the Objective. To implement the range query, we can just reuse the same Objective and spawn from the stops again at the next later departures at the given stops.

I get why you want to name it Objective, however I disagree since the objective is implemented in the algorithm (minimize arrivalTime/maximize departureTime) and can not really be changed by replacing the Objective, except for the TimeType (which by this logic would have to carry the name objective). Therefore, I would name the object containing all the route request specific details something like Request or Query since it will hold all details relevant to the specific request and this is also the reason why I would move spawnFromStops to this object. I regard the RaptorRouter something like a service, that holds all schedule relevant data to query and accept requests. The processing of the request can then be done by a temporary processing instance (Query, Request, or at the moment Objective).

Can you further elaborate on this? LegImpl should never be visible outside of the Raptor package. So only ch.naviqore.raptor.Leg should be visible from the outside, which is quite clear that it belongs to the Raptor. The user does not need to know if it is an interface or a concrete class. If I use a class with Impl postfix outside the package, then it triggers a weird feeling. That's why I like it somehow 😄

I agree, that I wouldn't want to see a Impl postfix outside of the package. But I also get a cold shiver down my back if I see Impl inside the package, so thanks for renaming. The reason I would also like to name the Interfaces RaptorLeg/RaptorConnection instead of Leg/Connection is that we would prevent working with two implementations of Leg/Connection on the service level, resulting in Type Annotations specifying the full definition (e.g. ch.naviqore.raptor.Leg instead of simply RaptorLeg. Java should introduce alias imports, that would also resolve this issue...

munterfi commented 3 months ago

@clukas1, thanks for the feedback! I tried to address most of it, feel free to continue. I will not work on the refactoring until our meeting on Wednesday to avoid conflicts.

I get why you want to name it Objective, however I disagree since the objective is implemented in the algorithm (minimize arrivalTime/maximize departureTime) and can not really be changed by replacing the Objective, except for the TimeType (which by this logic would have to carry the name objective). Therefore, I would name the object containing all the route request specific details something like Request or Query since it will hold all details relevant to the specific request and this is also the reason why I would move spawnFromStops to this object. I regard the RaptorRouter something like a service, that holds all schedule relevant data to query and accept requests. The processing of the request can then be done by a temporary processing instance (Query, Request, or at the moment Objective).

I did my best to move the main routing logic from the RaptorRouter to the Query object and encapsulate access to the Labels in the Objective object (you can rename this if you don't like the name, important for me is that the access to the labels and best times is encapsulated, which makes it a lot easier to track the routing algo in the debugger). Also introduced an internal RaptorData interface for access to the raptor data structures (interface segregation).

Additionally, I have discovered a strange issue in the getBestTime method: https://github.com/naviqore/public-transit-service/blob/92e2ab091ca5ed4d66126e6c5df16a92e303e420/src/main/java/ch/naviqore/raptor/router/Objective.java#L65, where I suspect that we are not updating the global best time when relaxing footpaths. To reproduce the issue, remove the _REMOVE prefix here: https://github.com/naviqore/public-transit-service/blob/92e2ab091ca5ed4d66126e6c5df16a92e303e420/src/main/java/ch/naviqore/raptor/router/Query.java#L189 and run all tests.

My interpretation is that we can remove this method and directly use getBestTime if we track the global best times correctly?

clukas1 commented 3 months ago

@munterfi some final touches from my side. Good to merge in my opinion.

Summary of changes:

My interpretation is that we can remove this method and directly use getBestTime if we track the global best times correctly?

Actually both methods are needed (one of the reasons I introduced the transfer only test). The problem if you use the comparableBestTime as cut-off value all transfer legs will be removed at the end of the round. While writing this now, I just realized that the best way would have been to make add/subtract the same transfer time to route arrivals/departures instead of subtracting/adding them to transfers. Which would have solved this (improvement for a later issue)... However, I've renamed the methods and added a docstring to clarify when which one should be used in e2ce8f4.

This is exactly the problem that I try to explain, we don't use it, but we should. We should sort all our returned connections in RaptorAlgorithm / Router, by calling sort before return in the LabelProcessor: return connections.stream().sorted().toList(); Which will then implicitly call the compareTo required by the Comparable Interface. If we do this now, most of our tests fail. This is not because the Raptor is not working, but simply due to sorting issues. The problem is that the way we implement the Raptor currently influences the order of the results, which is not very stable. Additionally, if we implement further versions (native or mcRaptor), this instability will cause further issues. In my opinion, the necessity to sort connections is an essential property. Otherwise, the order is not transparent for the user.

Actually, they are sorted by default by the number of rounds (route trip legs) and arrival/departure time in descending/ascending order. I've reworded the docstring return value in the RaptorAlgorithm interface to be more explicit about this in c17df10

I did my best to move the main routing logic from the RaptorRouter to the Query object and encapsulate access to the Labels in the Objective object (you can rename this if you don't like the name, important for me is that the access to the labels and best times is encapsulated, which makes it a lot easier to track the routing algo in the debugger). Also introduced an internal RaptorData interface for access to the raptor data structures (interface segregation).

As you might have guessed, I disagreed with the Objective name. I've renamed it to StopLabelsAndTimes to be explicit about what it contains (44c6a2f). Hope you can also live with that name. Like the RaptorData to end with a positive comment.