Implement public transport routing

organicmaps / organicmaps

🍃 Organic Maps is a free Android & iOS offline maps app for travelers, tourists, hikers, and cyclists. It uses crowd-sourced OpenStreetMap data and is developed with love by MapsWithMe (MapsMe) founders and our community. No ads, no tracking, no data collection, no crapware. Please donate to support the development!

https://organicmaps.app

Apache License 2.0

9.98k stars 957 forks source link

Implement public transport routing #5331

Open dikkechill opened 1 year ago

dikkechill commented 1 year ago

Please collect all relevant tasks to implement public transport routing here:

Public Transport Routing Vision

The current OrganicMaps architecture vision (from comment)

Server collects schedules from different sources. Efficient implementation in golang, rust, or C++ is preferred for easier scaling and maintenance.
Different formats are converted to the universal one that can be read by Organic Maps and other clients to allow the following functionality:
- Clients are periodically updating schedule info, that is cached on them and can be used offline, if there is no connection. With the connection real-time info is available.
- Clients build routes from this schedule info offline.
- To scale efficiently, this format should be compact and easily cacheable on HTTP CDN.

The main goal is to make a worldwide public transport routing better than Google. Public routing that is open-source and available to everyone. Development and support can be sponsored by cities and countries, by funds, and by the community.

Old PT approach

The previous approach of PT implementation was to embed GTFS feeds into offline maps data. It is not suitable for realtime updates and thus is not at the focus of this task.

Related issues

[ ] Collect all public sources of schedules (realtime and not realtime) that will be collected by the PT server, and document data formats that they are using
[ ] Design real-time PT data architecture https://github.com/organicmaps/organicmaps/issues/5332
[ ] Design or reuse an efficient server => clients format for real-time schedules info, that is easily cacheable on CDN, clients and can be used even when clients are offline
[ ] Design PT routing algorithm for clients that will use real-time data if it's available, and use already downloaded maps data if real-time data is not available.

Improve UI

Other

[ ] gtfs_converter issue experimental feature https://github.com/organicmaps/organicmaps/issues/3416
[ ] https://github.com/organicmaps/organicmaps/issues/4495
[ ] https://github.com/organicmaps/organicmaps/issues/3160
[x] https://github.com/organicmaps/organicmaps/issues/4935

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

biodranik commented 1 year ago

@dikkechill I have updated the description. Feel free to create missing issues to track each subtask. Thank you for your help!

dikkechill commented 1 year ago

@dikkechill I have updated the description. Feel free to create missing issues to track each subtask. Thank you for your help!

Thank you for the update biodranik.

Do I understand it correctly you want to disregard further development on the experimental import of GTFS data into the .mwm files?

I actually saw implementing this experimental feature as a relatively straightforward first step to get public transport data in. Then OrganicMaps users can get something in the short term that's good, but not perfect.

In my mind the second step would then be to rethink the architecture for real-time data and make it closer to perfect. Something I expect to be a lot more challenging, both in implementation effort and in infrastructure maintenance. Which takes a lot longer to deliver.

By still implementing the experimental feature, the perfect doesn't become the enemy of the good. Even if this means duplicating some implementation effort. What are your thoughts?

biodranik commented 1 year ago

Our resources are very limited to work on something that will be dropped later. Embedding GTFS data in mwm files requires a lot of effort and introduces many potential support issues, while not solving the issue of PT schedule fluctuations.

How much client-side work will be required for the experimental approach?

Mapping GTFS data to OSM data is another big issue.

The common part in these approaches is collecting feeds from different sources. Btw, are there any other existing projects that already collect real-time feeds? Can we start from there?

In general, I don't mind if someone implements it. But IMO it won't be very usable with our monthly/weekly map data update cycles. People won't switch to Organic Maps from other, better aps for Public Transport planning.

jumelles commented 1 year ago

Btw, are there any other existing projects that already collect real-time feeds? Can we start from there?

https://database.mobilitydata.org/ -- This looks promising!

LaPingvino commented 1 year ago

I actually saw implementing this experimental feature as a relatively straightforward first step to get public transport data in. Then OrganicMaps users can get something in the short term that's good, but not perfect.

@dikkechill as another issue shows, preparing this demands a LOT of data (that issue said more than 35GB, and I think that was the data available by that user). If you ask me, a better approach would be supporting e.g. loading any specific gtfs file by needs of the user. Implement this manually first for a small and simple gtfs file, fix issues and add features, then map specific maps to relevant gtfs files for that area so they can be loaded in only when needed.

I think it's nice that there are huge collections of existing sources, but selecting which gtfs you want to use will give a huge added value, especially also for open source gtfs development for testing etc.

I can imagine that gtfs is very heavy to load in because it's a zip based format with text files. Another possible approach then is to design a format that is easy to work with and enable adding transport providers through this format. It shouldn't be too hard to write a converter between gtfs and such a format and if this reduces load on the app or e.g. enables selective loading where needed, that's probably worth it.

Happy to write any of these conversion tools etc. I already played a bit with tools dealing with gtfs and will probably write a good bunch more over time.

mil commented 9 months ago

I've been working on Mobroute, which is a FOSS GTFS router based in Go that is designed to be embeddable (as a library); and the public transit routing algorithm runs directly on phone (example app here). Mobroute ingests GTFS data from agencies (sourced from the Mobility Database); and the project aims to be an efficient & end-to-end solution (both GTFS ETL & algorithm implementation) for offline/privacy-respecting public transit routing. The codebase is designed to be minimalistic (~2k LOC ETL / ~2k LOC routing algorithm/api). I believe that Mobroute could solve many of OM's public transit design requirements (re: Clients build routes from this schedule info offline. & also being compatible with a good number of metros).

I am curious how open Organic Maps would be to a potential collaboration / integration with Mobroute? As a user of OM, I would very much be open to taking OM in as a supported target for Mobroute & taking a potential integration into design considerations for the library which I am working on stabilizing the API for over the next few months.

If OM contributors or maintainers have experience with OM's current routing system(s) & UI, I'd be curious to hear your thoughts (re: integration) & would be happy to field questions from my side.

Fale commented 9 months ago

There is a lot of movement on those kind of themes lately. An interesting one is also this.

biodranik commented 9 months ago

@mil that sounds interesting. Some questions:

Are GTFS-based routes matched on the OSM data?
What is the capacity of the GTFS (MobilityData?) server(s)? Will it survive requests from millions of users?
Did you run it on iOS and Android?
What algo is used?
Does it support different types of transport?

omentic commented 9 months ago

@mil That looks like a really cool library. You might consider reaching out to the Transportr people (https://github.com/grote/Transportr/), development recently restarted after a couple of years, and most of their cities depend on now-defunct providers. Ah, looks like they're mentioned in the FOSDEM presentation linked above too. This specifically was what happened to them in the period of inactive development:

Navitia drops many coverage areas from their free service and SNCB discontinues their Hafas API, leaving the FOSS transport apps with several large gaps in the regions they can serve.

mil commented 9 months ago

@biodranik thanks for the questions - feedback below:

@mil that sounds interesting. Some questions:

1. Are GTFS-based routes matched on the OSM data?

2. What is the capacity of the GTFS (MobilityData?) server(s)? Will it survive requests from millions of users?

3. Did you run it on iOS and Android?

4. What algo is used?

5. Does it support different types of transport?

1) OSM data is not used at all. Mobroute is solely a GTFS router.

2) So the way Mobroute works is that GTFS feeds are fetched directly from agencies themselves. MobilityData distributes a CSV file listing thousands of upstream agencies GTFS URLs (and associated bboxes & metadata). Mobroute can ingest from these source agencies as indexed by MDBID (the ID # they indicate per GTFS/agency source) or from raw GTFS URLs. See here and the raw CSV link on that page for details on Mobility Database. Added a few more details below after questions section. There is no 'middleman' server with using MobilityData's CSV.

3) Testing was done on Linux & Android. Nothing in specific is platform-dependent (and go-mobile supports alot including iOS - binding is per-platform). The only real hard dependency is SQLite >=3.28.0 (this means Android >=11).

4) Algorithm used is an implementation of the Connection Scan Algorithm (CSA) - more details on underlying routing API around CSA is here. Among the popular public transit routing algorithms (RAPTOR, etc.) CSA is mainly known for being among the most efficient & simple.

5) Yup - GTFS supports different route_types in route.txt (tram, rail, bus).. there are many different types in the Mobility Database feeds.

The way I imagine an OM/Mobroute integration working would be we would select a few well-supported metros/MDBIDs to start.. and have Mobroute bake the GTFS (directly from agencies) into Mobroute's routing-optimized SQLite format (in OM's CI) to be bundled with MWM. Doing this in OM's CI would make it so source agencies GTFS archives are only requested once (per update cycle). Part of the reason why an integration with OM appeals to me is that the 'datamodel' with OM is similar to Mobroute in that the UI already has a 'download' step for the user (which is a necessary component for Mobroute).

@omentic thanks for mentioning - I've also been considering Transportr as well. I think there's a difference between the approach of 'routing ETL and algorithm as a library' (Mobroute) versus 'routing algorithm as an external HTTP server' (Navitia, OTP, GraphHopper, MOTIS). OM seems like a good fit for something like Mobroute / library approach with minimal UI changes (just bundling GTFS-derived data with MWM and running CSA algorithm on mobile); versus Transportr which may require heavier UI changes (as they rely on 'online' routing).

biodranik commented 9 months ago

Thanks for the answers.

How will users see routes on the map without matching with OSM data? Won't it look ugly and unusable?
Embedding frequently (even in real-time) changing schedules info into mwms may not be the best idea, and it doesn't allow other FOSS apps to use the same PT service.
OM works on Android 5 and above. Jumping to Android 11 is too radical.
Can you please provide some route examples produced by your lib with different transport types, from different GTFS sources? E.g. in Switzerland, including bus, tram, train, ferry, Zürich Polybahn, and Seilbahn Rigiblick? Or in some other country.
Are there any other useful sources/feeds except GTFS? Any plans to support them?

mil commented 9 months ago

Thanks for the answers.

1. How will users see routes on the map without matching with OSM data? Won't it look ugly and unusable?

2. Embedding frequently (even in real-time) changing schedules info into mwms may not be the best idea, and it doesn't allow other FOSS apps to use the same PT service.

3. OM works on Android 5 and above. Jumping to Android 11 is too radical.

4. Can you please provide some route examples produced by your lib with different transport types, from different GTFS sources? E.g. in Switzerland, including bus, tram, train, ferry, Zürich Polybahn, and Seilbahn Rigiblick? Or in some other country.

5. Are there any other useful sources/feeds except GTFS? Any plans to support them?

Providing OSM-to-GTFS stop matching etc. is a bit of a separate problem. Are you just wanting to address showing OSM metadata in the individual routepoints (stops) or something more? Having the route points rendered on the map wouldn't necessarily be unusable or ugly (imo) but it'd depend on UI implementation.
Good point. The SQLite data for MR (per region) as a standalone (created in CI) works as well (not embedded in MWM); as a standalone other projects could utilize the same data.
I suppose a recent SQLite version binary could ship with MR itself rather then using system SQLite (though tooling may be tricky).
Sure - sample routes generated from tests (based on actual GTFS data from different providers & transport types) are at: http://ci.lrdu.org/tests_results/
GTFS is the most widely used & accepted standard for PT data. I do plan for GTFS-RT support (realtime updates); GTFS-RT is protobuf that builds on GTFS static data & very small in comparison with GTFS. Other formats could be added in the future but are not on my roadmap currently.

biodranik commented 9 months ago

It looks ugly without binding to OSM data. Are you aware of any project that tries to solve GTFS <=> OSM matching globally?
Why SQLite? Which functionality of it is required for your implementation?

dikkechill commented 9 months ago

It looks ugly without binding to OSM data. Are you aware of any project that tries to solve GTFS <=> OSM matching globally?

I'm aware of pfaedle for OSM - GTFS matching.

XioNoX commented 9 months ago

What about using the existing PT route relations present in OSM ? https://www.openstreetmap.org/#layers=T In many places they're more accurate than trying to route from stop to stop, and it would be an incentive (and reward) for people to add that data to OpenStreetMap.

michaelblyons commented 9 months ago

@XioNoX isn't that what people have been talking about with respect to matching? The link in the comment right above yours uses the OSM relationships linked to GTFS.

Triton171 commented 9 months ago

@mil How large is the SQLite database that one needs for routing typically? If I understand it correctly, it stores an entry for each direct connection (i.e. a service going at some point in time from one stop to the next on its route). For something like a frequent bus/metro line with a few dozen stops, it seems like this could easily add up to a few thousand entries in the DB. Optimizing the size of the timetable is probably pretty important considering that it'll be stored on a phone (with often very limited space) and ideally updated frequently.

mil commented 8 months ago

@biodranik GTFS spec provides shapes.txt which may address some aesthetic concerns. It's an optional in GTFS spec. Currently my routing system doesn't utilize shapes.txt at all but support wouldn't be hard to add in the future. I'm not well versed in the OSM-GTFS matching problem but pfaedle mentioned above does look interesting.

SQLite is used for storing the GTFS data & querying; SQLite is a natural fit as GTFS is modeled as relational tables. More detailed information on implementation & its functioning is available in docs.

@Triton171 the database size varies and is proportional to GTFS archive input size. Optimizing the size is indeed important; I hope to address estimations of required size constraints (per-region etc.) in future updates.

charlie2clarke commented 8 months ago

I'm interested to understand how I can be most helpful in contributing to the public transport functionality as I'd like to identify which work aspect would be best to focus on for a potential GSoC project. I am primarily interested in the server-side work, and having read related issues and the docs, my initial questions/thoughts are:

Given that there is the intention to move away from embedding public transport data into .mwm files, would fetching this data over an API be a suitable option? This API could fetch GTFS Schedule and Realtime data, and if the option is specified, convert and return the data into OSM format.
The previously mentioned pfaedle looks to be a good option for GTFS -> OSM mapping.
The main projects using the GTFS Realtime data I can find are Google's Live Transit Update's and OpenTripPlanner who seem to be using a pull based model to get GTFS data and they are also exposing some of this data via a GraphQL API, though I'm not 100% sure what they're doing with this.

biodranik commented 8 months ago

@charlie2clarke thanks for your questions! Here is an updated overview of the PT task:

There are separate parts of this complex PT feature. One is using OSM routes/stops data to show routes to users. Another is to prepare and fetch frequent schedule updates, match them with the existing OSM data, and use them for display in the Place Page and route calculations.
Embedding frequently (often in real time!) updated schedule data into offline maps doesn't make much sense. That data should be generated/downloaded/cached/stored separately. An additional benefit of that approach is that any other app/project then can reuse it. This may motivate other contributors/app developers to help.
The API to fetch schedules should be a simple HTTP GET for a file, so it can be automatically cached on any CDN to scale indefinitely.
That means we need our (fast and efficient!) server code that will collect GTFS feeds from many sources, and pre-process them, preparing for clients to digest.
While parsing GTFS directly on clients sounds "cool", that doesn't scale well, creates a lot of overhead, and may be very complex to match with the existing OSM data (considering it was already processed by OM map generator). Using a lightweight, simple, scalable and cacheable data format for schedules on clients looks like a better solution.
That means we can efficiently process GTFS on a server. To avoid spending a lot of money on it, and on writing a horizontally scalable code, an efficient implementation in Rust or golang may handle all the job.
The final goal is to process thousands of feeds from all over the world, as soon as they have been updated, to deliver the (as real-time as possible) updates to hundreds of millions of clients.
As our mission is to build maps that are better than other maps, there are many other sub/side projects around PT: matching GTFS with OSM, properly updating OSM data, providing an easy way for businesses/transport companies or even for end users to update schedules, etc.

Please let me know what you (and anyone else) think about this approach.

charlie2clarke commented 8 months ago

Thanks for the clarification @biodranik, I think I would be best suited to looking into the preparing and fetching of schedule updates for the static and realtime PT data. Regarding the potential GTFS API:

I see two main aspects which this API needs to serve (perhaps would be best split into two?) - serving static PT data and realtime updates.
The GTFS Schedule data is text based which must be extracted via zips, and the GTFS Realtime data is in protobuf format served via HTTP that in some cases can use protocols such as Websockets, MQTT, or Server-Sent Events. I'm not sure if you are implying that we should change these data formats, but given this is a standardised format (although Google made their own variation) is it not best to stick to this as it may make engagement from other contributors more likely?
I don't think the GTFS format is ideal and understand that the GTFS -> OSM matching poses problems, but this can be averted for displaying schedules as this doesn't depend on being plotted on the map. However, trip updates that involve changes to routes would still require this mapping. Could the previously mentioned pfaedle not be used server-side if the client requests for data in OSM format?
Do you have any CDN provider in mind?
A big design decision that would need to be made for this potential API would be if PT data should be fetched with a push or pull model. The static GTFS Schedule data lends itself nicely to using a pull model, and caching could be included to minimise the requests made by the client. However, the realtime API solution isn't so obvious, from my knowledge the difference is:
- Pull model:
  - Simpler
  - Won't scale as well
  - 'Half realtime' - would need to understand at what frequency is suitable to fetch data updates, if the intended poll frequency is low then this might be ok.
- Push model:
  - More complex
  - Realtime (or as close as we could get it)
  - More scalable
  - Would most likely have to handle state reconnection logic (problem of WebSockets). Given OM's offline focus this could be a problem. There is something like Server-Side Events which automatically handles reconnection logic, but this only officially supports web applications.

I'm honestly not sure what the difference in cost would be as on the one hand the pull based model is simpler and would have fewer components, however, there would be many more API calls. The push model is more complex, but resource usage would scale well.

The cost of fetching realtime updates could be optimised slightly by only establishing connections when a stop is clicked on, a navigation is in progress, or even in the future, a user has a saved route so they could for example receive traffic updates for their commute.

In either case, given that performance is a high priority, maybe a gRPC API would be best?

dikkechill commented 7 months ago

@charlie2clarke Thanks for your thoughts on this! I opened #5332 specifically for the real time data architecture a while ago. The project I found most interesting is travic by @patrickbr (University of Freiburg). He's also the author of pfaedle. Maybe we can get more information on the travic operations, current setup (some details are in the 2014 paper) and lessons learned. Perhaps some form of cooperation is possible. @biodranik do you want to reach out 'officially' from organicmaps? I could also do this.

biodranik commented 7 months ago

Our strategic goal is to build a feature that will work comparable to or better than Google and Apple Maps public transport features. Let's design the feature and architecture from this point of view. In particular: building and displaying mixed routes on the map, including stop changes/transfers, and schedule info (if available).

Our use case should support both: routes from OSM with some kind of static schedules (to help people where GTFS feeds are not available), and with real-time info in other places.

There are different ways to implement it while keeping the offline-first in mind, for example:

Clients download GTFS and process it locally, including matching with OSM data.
GTFS and other formats are collected on the server and matched with OSM data there. Clients work with an efficient and specialized format (compact, easy to decode, stores only necessary data, easy to cache and use in offline).
The same as (2), but clients work with GTFS.

IMO (2) is better suited for the strategic goal, but I'm open to discussion.

In the future, if the data is collected/processed on the server, any online API services are also possible.

Note that to reach the strategic goal, the architecture should be easily scalable and cheap for millions of users.

Which CDN to use should not matter at all. HTTP content is easily cacheable on edge servers.

Using pfaedle or any other tools to match/process raw data with OSM may be a good start. Doing it on the server is way easier than on clients.

Push or pull is a good question. AFAIK, to properly implement push, apps should work with Apple or Google notification systems. The pull model looks easier and more scalable for OM clients <=> OM PT Server. The push model is great for GTFS sources <=> OM PT Server interaction.

Regarding static GTFS data, does it make sense to build a validator for it, so this data can be directly stored/updated in OSM? Or would it be better to enrich OSM data with external sources?

charlie2clarke commented 7 months ago

Thanks @dikkechill, I'll have a read of that paper.

@biodranik Thinking more generally about differentiating OM from Google/Apple PT features:

I think leaning into OM's offline first approach can always help. I remember being on an underground train and even though I had no signal, OM helped me find an alternative route following a cancellation. Given this, I think it will be important to balance the frequency of PT data updates to maintain up-to-date routes/schedules but not impact performance.
A not so related thought, but maybe a useful PT feature idea that I haven't seen before could be a 'time to leave' prompt. When you use PT and you aren't already on the move, you inevitably need to choose what time train/bus/tram/ferry you want to get and then you have to back-track yourself to work out what time you should leave (depending on how you're getting there) - shouldn't your maps app tell you when you need to?

IMO (2) is better suited for the strategic goal, but I'm open to discussion.

I agree that 2 is the best option. Regarding the data format for client-side (bear in mind I'm not experienced with geospatial data):

From initial reading, a vector format is suitable for this sort of thing, though many vector based specifications depend on a small db like SQLite. Am I right in saying that this approach would be too expensive to have on the client?
I've also read that pbf is very compact with quick read and write. It seems that osmium allows spatiotemporal queries on pbf data, but I haven't done it myself.
When you say 'stores only necessary data', am I missing something, but wouldn't you want all static PT data relevant to each map downloaded to ensure offline functioning? But for realtime data you could store only necessary data.

Regarding push/pull: I agree that the pull model looks a lot easier, my only concern would be that if there are a million users polling OM PT servers every 5 seconds for example, then this would cost more than OM PT servers pushing an update to online OM clients only once there is an update to send. I don't have experience at this scale though so I'm happy to be told otherwise. I don't imagine all GTFS source will support push based models so the GTFS sources <-> OM PT servers would have to use a mix.

Not 100% clear about the validation aspect, is my understanding correct: so the OM PT API would fetch GTFS feeds, these could then be validated using existing tools like gtfs-validator, the OM PT API could then do GTFS -> OSM matching, and then the OSM data could be validated again (or the decided on format suitable for clients) - does it need special validation before doing GTFS -> OSM matching? I notice that the subways-preprocessor seems to handle GTFS data - is this not in use at the moment?

I am quickly learning of the size and complexity of this work… I think it's good to talk about the big picture to get a better understanding of all the work aspects, but I am aware that in the short-term, I would like to scope my work so it's suitable for the GSoC period. However, I know that I would be continuing development on this feature beyond the GSoC end date.

Misalf-git commented 7 months ago

A not so related thought, but maybe a useful PT feature idea that I haven't seen before could be a 'time to leave' prompt. When you use PT and you aren't already on the move, you inevitably need to choose what time train/bus/tram/ferry you want to get and then you have to back-track yourself to work out what time you should leave (depending on how you're getting there) - shouldn't your maps app tell you when you need to?

Öffi does something similar. One can set a reminder for a selected departure time using the system's calendar app. IIRC it doesn't take into account the actual time required to get to the station, but an arbitrary "15 Minutes earlier" or something. Still a nice feature.

Triton171 commented 7 months ago

I'd also be interested in helping move forward this feature. It seems to me that the first step has to be deciding on a client-side data format and routing algorithm (since the data format depends on the algorithm). For the algorithm, I'm aware of a few alternatives:

Connection Scan Algorithm: This is the algorithm that Mobroute (mentioned earlier in this thread) uses. The algorithm requires a list of all (direct) connections in the network, sorted by departure time. It then traverses the list in order and checks for each connection whether using it improves the earliest possible arrival time at the destination of the connection. Footpaths / Transfers have to be precomputed and included in the input data. This can lead to very fast query times, but also has some downsides: The client-side data format cannot be very compact since it has to explicitly contain all connections & transfers. Also, it does not scale very well if the routing region is not restricted a priori (it scans all connections, regardless of how far away they are from the start and destination). Due to needing precomputed transfers, the length of such footpaths would probably need to be limited by some small constant. Also, I'm not sure if this method can efficiently be extended to handle real-time data because it relies on having the pre-sorted list of connections (which will no longer be sorted once some trips are delayed).
Raptor: The algorithm is divided into a number of rounds, where the k-th round computes all shortest paths using at most k-1 transfers by scanning all (reachable) routes ones. The query time is significantly larger than for CSA (still at most a few dozen milliseconds, so this might not really matter) and it has the same downsides regarding precomputed transfers. However, the overall storage requirements are probably not as large and it could reasonably support very large routing regions and real-time data.
Trip-Based: Essentially just Dijkstra's algorithm but on the graph whose vertices are trips and whose edges are transfers between trips. Has the same downsides as Raptor and additionally requires precomputed transfers not just between stops but between trips, which increases storage size and complicates integrating real-time data. It outperforms CSA in terms of query time (at least according to their own results).

All of these algorithms have the disadvantage of only working with precomputed transfers. This not only increases the size of the client-side data format but also makes hybrid routing handling (long) footpaths as well as public transit very difficult. To properly support this, I only know of the "naive" approach:

Simple Dijkstra / A: Essentially just use the existing pedestrian Dijkstra implementation but add (time-dependent) edges for public transit connections. This should be easier to implement than the more specialized algorithms above, but query performance might be a problem: Having time-dependent edges could drastically slow down the algorithm because computing the earliest arrival time at the endpoint of the edge is now longer a simple addition but instead requires a binary search for the earliest trip that can be reached. This might be mitigated by optimizing for the case of a regular frequency & only storing points in time when the frequency changes (which is also used in the GTFS format). Another reason for deteriorating performance could be worse Future Costs: A uses lower bounds for the time needed from each node to the destination for a goal-oriented search. The better these lower bounds, the fewer "irrelevant" vertices are visited by the algorithm. The easiest approach (which is also the one currently used by Organic Maps) is to just use flight-line distance multiplied by the maximum possible speed for these lower bounds. With public transit, the maximum possible speed might be something like 300 km/h for high-speed rail, which can make the lower bounds almost useless (even in less extreme cases, the lower bounds will still be a lot worse than for pedestrian routing).

In my opinion, the simple Dijkstra approach is the most promising. It's more flexible and easier to implement than the alternatives. The larger query time compared to the other algorithms might not be a huge deal, since single-digit millisecond query times are not necessary anyways. The data format could also be pretty simple:

Store a list of routes, each of which contains:
- A list of stops (i.e. references to vertices on the map) together with their relative arrival & departure times compared to the start of each trip
- A list of starting times for this route, ideally in a format that efficiently encodes intervals during which the route is serviced with a fixed frequency.
- Some metadata for the route for display to the user, including the route name, shorthand name, information on how to draw the trip on the map, etc.
Store a mapping from vertices of the map to a (usually empty) list of routes that stop at this vertex together with the index of the vertex in the stop sequence of the route.

Sending real-time information for some routes should be fairly straightforward assuming that client & server have some shared route IDs. It would probably be nice to write down a specification for the data format (subject to change of course) somewhere, because this would allow us to implement the client & server independently of each other. I'm not too familiar with the Organic Maps codebase, so there are probably many things that I've overlooked and that more competent people can point out, so I'm happy about any remarks & corrections.

biodranik commented 7 months ago

Speaking about Google and Apple Maps, they are still not always great when building the fastest PT route with several transport type changes, even in areas with great data like Zürich.

Isn't "time to leave" already embedded in Apple and Google calendars? That would be a handy feature indeed.

AFAIK SQLite is used in mobile apps for two purposes:

Save disk space when storing many small files (we likely need it too for Satellite imagery support).
Querying DB/data in a way easier for developers. As you see, we are often more focused on "easier/better for users" than "easier for developers" in Organic Maps :) That leads to implementing our own data formats or algorithms that allow us to get a better performance or functionality.

I heard that pbf for GTFS is developed by @Zverik

"Storing only necessary data" means removing anything unnecessary to build the right routes.

We'll come back to the pull/push model later when other components are ready for integration.

does it need special validation before doing GTFS -> OSM matching

Maybe someone can already answer this question?

Regarding the subways-preprocessor seems to handle GTFS data, @alexey-zakharenkov how is it used now/planned to use? As a side question, it would be great to extend your subway validator to support buses and trams, is there a list of tasks/guidelines on how to achieve that?

The PT feature is indeed complex and may take a lot of time to implement. Doing a part of it would already be better than doing nothing ;-)

@Triton171 thanks for your detailed post! Would it be feasible to reuse our current modified A* algo on the PT data with minimum modifications, so that when we add alternate route support #1183 they will also work for the PT out of the box?

alexey-zakharenkov commented 7 months ago

Regarding the subways-preprocessor seems to handle GTFS data, @alexey-zakharenkov how is it used now/planned to use?

If --output-gtfs option is given, the preprocessor generates GTFS with input metro networks. It cannot read GTFS.

felixguendling commented 7 months ago

Hi everyone! :wave:

I'm maintaining MOTIS and public transit routing is my main topic since more than ten years now (including real-time, special cases like park&ride, etc.). In MOTIS, we have CSA, RAPTOR and TripBased and a time-dependent Multi-Criteria Dijkstra (MCD) implemented so we can compare them.

One thing that's missing in this discussion and making the comparison between Dijkstra/A and CSA/RAPTOR/TB unfair is "multi-criteria". In public transport (as opposed to street routing) we're usually not only interested in the fastest journey. Other criteria like the number of transfers are very important, too. Consider this example: connection A takes 1h with 2 transfers, connection B takes 1h 10min with 1 transfer. My guess is that most people would prefer connection B as you save one transfer with only 10min extra travel time. However, since we don't know the exact weighting of a transfer for every single user, the best approach is to compute the Pareto front (or Pareto set) of all optimal trade-offs. RAPTOR and TripBased produce a Pareto set but a Dijkstra/A would not. So comparing performance of single-criteria algorithms to multi-criteria algorithms would be an apple to oranges comparison as they don't solve the same problem. With the goal to produce high quality results comparable to Google/Apple maps, some kind of multi-criteria is a must in my opinion.

Another point I would like to bring into the discussion is the range-query topic and earlier+later search. In research as well as in practice there are two types of queries: earliest arrival query (or giving an arrival time: latest departure, so you need to search backward in time from the destination) and range-query (also forward+backward). A range query gives you an overview of the best connections in an departure/arrival interval. All mentioned algorithms (CSA, RAPTOR, TB) have also range/profile variants. To produce a similar user experience to what other systems (like Google Maps, HAFAS, etc.) offer, the range/profile variant with earlier+later search functionality would be required (in both directions, so the user can specify departure or arrival time).

Regarding RAPTOR performance:

The query time is significantly larger than for CSA

I don't think this statement is true in general. CSA looks at all connections as it scans through time, no matter where the connection departs. With our implementation in MOTIS, we have the experience that CSA becomes slow as the area grows. RAPTOR only looks at connections departing at stations that were marked in the previous round. CSA will still scan the connections in London, even if you are routing between two locations in Madrid.

Regarding TripBased:

Essentially just Dijkstra's algorithm but on the graph whose vertices are trips and whose edges are transfers between trips

I would say it's more like round-based BFS and vertices are trip segments.

makes hybrid routing handling (long) footpaths as well as public transit very difficult

It depends: state of the art routing engines like MOTIS do not have any issues handling long footpaths as first or last leg as this can be handled outside the RAPTOR/CSA/TB algorithm. If long footpaths between two public transport legs are required, you can look at the ULTRA approach [^1][^2].

In MOTIS, we have a separate routing engine for public transport: https://github.com/motis-project/nigiri (basic algorithm is the RAPTOR but with some adjustments, timetable uses bitfield encoding for traffic days to be more compact - to the best of my knowledge, nigiri is currently the fastest publicly available multi-criteria transit routing engine with the most compact serializable data model)

nigiri can be used independent of MOTIS and could be linked as library.

The approach to do intermodal door-to-door routing is to compute offsets by routing between the start location and close-by public transport stations for all modes in an external street routing (same for the destination). Then those start offsets are used to initialize the search. The destination offsets can be seen as additional ad-hoc edges considered only for this particular routing query.

nigiri supports all of the aforementioned routing features such as multi-criteria routing, backwards-search, range query with earlier/later search functionality, etc. (except ULTRA). It also has real-time update support and has a very compact data model that can be dumped to disk (or sent via network). It can parse GTFS ZIP files and process GTFS-RT protobuf updates.

What's currently missing in nigiri is the handling of GTFS shapes.txt. This can either be done external or it can be added to nigiri.

[^1]: Sauer, Jonas, Dorothea Wagner, and Tobias Zündorf. "Integrating ULTRA and Trip-Based Routing." 20th Symposium on Algorithmic Approaches for Transportation Modelling, Optimization, and Systems (ATMOS 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020. [PDF] [^2]: Baum, Moritz, et al. "UnLimited TRAnsfers for Multi-Modal Route Planning: An Efficient Solution." [PDF]

charlie2clarke commented 7 months ago

@biodranik When I have the time, I shall test out some potential client-side PT formats: some vector based SQLite implementation, and PBF. I think the trade-off OM makes of increasing performance at the sacrifice of developer ease is worthwhile, but its useful to keep in mind that the satellite support may require SQLite as the PT stuff could lean into this.

For myself, I think for the scope of GSoC I should focus on the PT API(s). One of elements to this which I would like to workout more is how the API can be queried/called. My initial thought on this is that I believe it makes a difference for the API if the OM client is expecting to render live PT vehicle positions - do you know if this is anticipated to be supported? IMHO, I don't think OM needs this as it will clutter a phone's screen, not to mention the high memory consumption. However, I think supporting route progress would be good, i.e., showing where the train/bus/tram is relative to each stop on its route.

@felixguendling Really cool project to see! I think fine tuning the PT routing algorithms would certainly be desirable, though as I understand it, there are a few dependencies to resolve before looking at the routing algorithms, like the data model. Does nigri have any specific data requirements for integration? Is there anywhere I can find some more info on the data model?

biodranik commented 7 months ago

The task of efficient collecting and (pre)processing of GTFS data on a server can be started independently.
At the same time, the most efficient data format for clients can be designed. Ideally, it should be compact, cacheable on CDN, and easily readable by algorithms on the client.
If we speak about many small files on clients, then some workarounds like SQLite may make sense, even for binary blobs. There is a limit to max opened files at the same time, 256 on iOS and 1024 on Android, and it is already taken by opened map files. On the other hand, if each downloaded file is very compact, can be opened, loaded into the memory/decoded, and then closed, SQLite workarounds may not be necessary.

Regarding routing algo, yes, it should not limit the expected user features. Let's list them all?

felixguendling commented 7 months ago

Does nigri have any specific data requirements for integration?

I'm not sure if I understand your question. As mentioned, nigiri uses GTFS and optionally GTFS-RT as input. It can load multiple GTFS datasets and update them with multiple GTFS-RT feeds.

An example can be seen at transitous.org or europe.motis-project.de.

Is there anywhere I can find some more info on the data model?

The basic data model is the RAPTOR data model with some adjustments to enable encoding traffic days with bitfields. So the RAPTOR paper is probably the best entry point to understand the data model.

Edit: data is stored in cista data structures to make it easily (de-)serializable.

focus on the PT API(s)

Maybe the transitous API would be a good starting point? I think the goals (privacy friendly, user oriented, etc.) overlap a lot. https://github.com/public-transport/transitous

If we speak about many small files on clients

In case of nigiri, it creates one timetable.bin file from all input timetables loading one after another. This could be done on the server or the client. So open file descriptor limits should not be an issue.

Zverik commented 7 months ago

I'll add one missing piece of information that's crucial for OM's public transport support. It's the gtfs-proto format implemented in https://github.com/Zverik/gtfs-proto/

The idea was, we set up a CDN for those files, just like we do for offline maps rn. Some server-side processing converts GTFS feeds daily, and also calculates deltas (diffs) between the latest feed and a selection of old feeds. So, the servier-side is akin to the current OSM data processing.

On the client side, we will have an internal sqlite storage for PT data, and a downloading panel that looks basically the same as the current map downloads. (Or we could package PT data inside the map files?). And there should be an update mechanism for PT data from gtfs-proto deltas, which downloads like ~100kb of data and updates the internal sqlite database.

The gtfs-proto format is missing the fare information, but already packages the entire network complete with the geometry. And it's 10x smaller than original GTFS feeds, which goes in line with OM core principles, keeping downloads small.

felixguendling commented 7 months ago

nigiri already has an architecture to support multiple data formats (currently, we have implementations for GTFS and HAFAS Rohdaten): https://github.com/motis-project/nigiri/tree/master/src/loader

Adding parser code for another format would probably only be a few hundred lines of code more as most of the code could be shared with the existing GTFS implementation.

biodranik commented 7 months ago

Does GTFS already support routes that may be different at different time of the day?

Does OpenStreetMap.org mapping scheme support it too?

a letter on a screenshot means a slightly different route.

charlie2clarke commented 7 months ago

Does GTFS already support routes that may be different at different time of the day?

@biodranik Yes it does. As far as I understand it, there are a few notable points about GTFS Schedules relevant here:

It distinguishes between a route and a trip, where a route is made up of a number of trips, e.g., you might have the London Underground Central Line route, and this route is made up of a number of trips like if it is east or westbound, where it terminates etc.
GTFS Schedules include at least one of calendar.txt or calendar_dates.txt. It is down to the service provider's discretion to determine how they use these, but best practice would dictate that if trips don't vary that much, then they would be included in calendar.txt and any small exceptions to that route like school schedules would be noted in calendar_dates.txt. But, if there isn't a consistent schedule then all schedule info might be included in calendar_dates.txt.

You can see more info on GTFS Schedule data here if interested.

Does OpenStreetMap.org mapping scheme support it too?

It doesn't appear to no. You can see a few OSM proposals below regarding public transport schedules, each of which were rejected on the basis that this would be unmaintainable given the variability of services. https://wiki.openstreetmap.org/wiki/Proposal:Public_transport_schedules/Timetable_relations https://wiki.openstreetmap.org/wiki/Proposal:Public_transport_schedules/Departures

Triton171 commented 7 months ago

Thanks a lot for the clarifications regarding different PT routing algorithms @felixguendling. In my opinion, using nigiri as a routing library for Organic Maps seems like a really good idea, since it does pretty much exactly what we need. Of course the decision whether it's worth the extra dependencies is up to the maintainers. There are a few questions & issues we probably need to resolve before being able to integrate nigiri as a routing engine into Organic Maps:

@felixguendling, you said that the client-side format of nigiri is compact. Do you know roughly how large the timetables tend to be (for example how large the timetable for a metro region like Berlin or London is in this compact format)?
As far as I can see, nigiri is currently packaged as a single library. Would it make sense to split this up, for example into a server-side and a client-side library (with some shared code between the two)? I'd assume that gtfsrt is not a necessary dependency for a client that never directly interacts with GTFS feeds & maybe this is also true for some of the other dependencies. Going one step further, one could also have a routing-core library, that only includes the Raptor implementation & necessary data structures (since Organic Maps doesn't need the Dijkstra implementation for footpaths) but I'm not sure if that is worth the effort.
Is there a nice way to handle overlapping timetables? One example might be routing in Germany using both Deutsche Bahn & one or more regional transport providers. Merging the timetables server-side has the drawback that clients can no longer decide exactly which timetables they want to download. Merging them client-side seems difficult to me, since one would need to add transfers between the different timetables. This is probably not a priority, but it would be nice to know whether we could support such use cases in the future.
As mentioned above, nigiri doesn't support route drawing (i.e. the data from GTFS shapes.txt) yet. In my opinion, adding support for this upstream makes the most sense (and shouldn't be too difficult).

felixguendling commented 7 months ago

client-side format of nigiri

There's not really a client side and server side format as nigiri is currently only used inside MOTIS which is used as a server-side application. However, nigiri is supposed to be embeddable as a library in any other application (or scripting language via bindings).

metro region like Berlin

For 365 days from now it has 114M, for the next two days it has 55M. You can try it out with this setup script:

#!/bin/sh

# TARGET="macos-x86_64"
# TARGET="macos-x86_64-noavx2"
# TARGET="macos-arm64"
TARGET="linux-amd64"
# TARGET="linux-amd64-noavx"
# TARGET="linux-arm64"

wget https://github.com/motis-project/motis/releases/latest/download/motis-${TARGET}.tar.bz2
tar xf motis-${TARGET}.tar.bz2

wget https://www.vbb.de/vbbgtfs -O vbb.zip

cat <<EOT >> config.ini
modules=intermodal
modules=nigiri

intermodal.router=nigiri
server.static_path=motis/web
dataset.no_schedule=true
import.paths=schedule-vbb:vbb.zip
tiles.profile=motis/tiles-profiles/background.lua
nigiri.num_days=365
EOT

./motis/motis

The files are placed in data/nigiri folder.

You can also try it with different other GTFS download links.

This is a stripped down version of the setup and installation guide for MOTIS. If you want to use the UI, you need to enable more modules.

Would it make sense to split this up

I guess that's easy at least for the GTFS-RT updating part.

Since the routing algorithm itself handles rt_timetable and timetable or only timetable based on template arguments and is header-only, the client (in this case OM) would just not invoke it with the Rt template flag set and therefore, no code for this scenario would be generated.

However, I could imagine scenarios where it makes sense to have real-time updates directly in the client.

Is there a nice way to handle overlapping timetables?

Yes, this is already done frequently. One example is the transitous project where the long distance trains are contained in several of the loaded datasets (which is a good thing! better than not having them at all).

For this case, nigiri

considers two stations from different datasets equivalent if they are less than 300m apart (link to code) and
considers trains that have less or equal 1 minute per stop difference in departure/arrival times as duplicates (link to code).

When duplicates are detected, they are merged together. This means that there's only one instance left that is linked to both trip_ids from both datasets, so real-time updates can be handled from both datasets. In theory, this can also be used to load the German Delfi dataset covering whole Germany (with no real-time updates so far) as well as regional GTFS-static datasets where GTFS-RT updates are available (but it hasn't been tried in practice yet).

adding support for this upstream makes the most sense

I agree. My preferred option would be to have them on-disk (to keep the memory footprint low as only routing data needs to be held in RAM) and memory map.

biodranik commented 7 months ago

For 365 days from now it has 114M, for the next two days it has 55M.

Wow, why is it so big?

It would be great to do all "dirty matching work" and preprocessing on the server, so clients can cache small regional data, and always be able to quickly download any additional regions if building longer routes there.

Note that ideal routes should be drawn on top of existing OSM roads/paths. Not ideal solution will be only a temporary goal.

felixguendling commented 7 months ago

Wow, why is it so big?

I would say this is very small (at least if you compare it to other solutions that have similar capabilities, such as OTP2). It's a buffer of memory that could be memory-mapped and is directly routeable.

The source data contains local times, so timezones as well as daylight saving time need to be converted into a consistent timezone (here: minutes since schedule begin as unixtime). This means that every trip will basically be duplicated (one for winter time, one for summer time) as doing timezone conversion during routing is not a good idea. Departure and arrival times are ~31MB.
Location information (name, id, timezone, coordinates, footpaths, etc.) including a serialized memory-mappable hash map for station ID lookup is ~20MB.
Routing algorithms like CSA, RAPTOR, or TB need the transitive hull of all footpaths, so you need to do component wise all-pairs-shortest-paths to generate the actual footpaths.
And then there are lots of smaller things you need for real-time updates (trip lookup data structures), outputting human readable trip names, route colors, provider information, departure tables, lookup data structures for the routing algorithm, etc. etc.

Details here:

 timetable::locations &: 19.34 MB [total=19.3456 MB]
 interval<std::chrono::time_point<std::chrono::system_clock, std::chrono::duration<int, std::ratio<86400>>>> &: 0 b [total=19.3456 MB]
 vector<pair<strong<unsigned int, _trip_id_str_idx>, strong<unsigned int, _trip_idx>>, ptr> &: 1.94064 MB [total=21.2862 MB]
 dynamic_fws_multimap_base<strong<unsigned int, _trip_id_str_idx>, strong<unsigned int, _trip_idx>, mutable_multimap_helper<strong<unsigned int, _trip_idx>, strong<unsigned int, _trip_id_str_idx>>::vec> &: 3.88129 MB [total=25.1675 MB]
 vecvec<strong<unsigned int, _trip_id_str_idx>, vector<char, ptr>, vector<unsigned int, ptr>> &: 3.15355 MB [total=28.3211 MB]
 vector<strong<unsigned short, _source_idx>, ptr, false, strong<unsigned int, _trip_id_str_idx>> &: 496.805 kb [total=28.8062 MB]
 vector<unsigned int, ptr, false, strong<unsigned int, _trip_id_str_idx>> &: 993.609 kb [total=29.7766 MB]
 vecvec<strong<unsigned int, _trip_idx>, vector<pair<strong<unsigned int, _transport_idx>, interval<unsigned short>>, ptr>, vector<unsigned int, ptr>> &: 3.7044 MB [total=33.481 MB]
 vecvec<strong<unsigned int, _trip_idx>, vector<unsigned short, ptr>, vector<unsigned int, ptr>> &: 993.613 kb [total=34.4513 MB]
 dynamic_fws_multimap_base<trip_debug, strong<unsigned int, _trip_idx>, mutable_multimap_helper<strong<unsigned int, _trip_idx>, trip_debug>::vec> &: 5.82193 MB [total=40.2732 MB]
 vecvec<strong<unsigned short, _source_file_idx>, vector<char, ptr>, vector<unsigned short, ptr>> &: 26 b [total=40.2732 MB]
 vecvec<strong<unsigned int, _trip_idx>, vector<char, ptr>, vector<unsigned int, ptr>> &: 2.50873 MB [total=42.782 MB]
 vector<interval<strong<unsigned int, _transport_idx>>, ptr, false, strong<unsigned int, _route_idx>> &: 148.25 kb [total=42.9267 MB]
 vecvec<strong<unsigned int, _route_idx>, vector<unsigned int, ptr>, vector<unsigned int, ptr>> &: 1.88833 MB [total=44.8151 MB]
 vector<clasz, ptr, false, strong<unsigned int, _route_idx>> &: 18.5312 kb [total=44.8332 MB]
 vecvec<strong<unsigned int, _route_idx>, vector<clasz, ptr>, vector<unsigned int, ptr>> &: 92.6602 kb [total=44.9237 MB]
 vecvec<strong<unsigned int, _location_idx>, vector<strong<unsigned int, _route_idx>, ptr>, vector<unsigned int, ptr>> &: 1.70512 MB [total=46.6288 MB]
 vector<interval<unsigned int>, ptr, false, strong<unsigned int, _route_idx>> &: 148.25 kb [total=46.7736 MB]
 vector<delta, ptr> &: 30.6648 MB [total=77.4383 MB]
 vector<std::chrono::duration<short, std::ratio<60>>, ptr, false, strong<unsigned int, _transport_idx>> &: 656.889 kb [total=78.0798 MB]
 vector<unsigned char, ptr, false, strong<unsigned int, _transport_idx>> &: 0 b [total=78.0798 MB]
 vector<strong<unsigned int, _bitfield_idx>, ptr, false, strong<unsigned int, _transport_idx>> &: 1.28299 MB [total=79.3628 MB]
 vector<bitset<512>, ptr, false, strong<unsigned int, _bitfield_idx>> &: 156.938 kb [total=79.5161 MB]
 vector<strong<unsigned int, _route_idx>, ptr, false, strong<unsigned int, _transport_idx>> &: 1.28299 MB [total=80.799 MB]
 vecvec<strong<unsigned int, _transport_idx>, vector<strong<unsigned int, _merged_trips_idx>, ptr>, vector<unsigned int, ptr>> &: 5.13854 MB [total=85.9376 MB]
 vecvec<strong<unsigned int, _merged_trips_idx>, vector<strong<unsigned int, _trip_idx>, ptr>, vector<unsigned int, ptr>> &: 2.73407 MB [total=88.6717 MB]
 vector<attribute, ptr, false, strong<unsigned int, _attribute_idx>> &: 0 b [total=88.6717 MB]
 vecvec<strong<unsigned int, _attribute_combination>, vector<strong<unsigned int, _attribute_idx>, ptr>, vector<unsigned int, ptr>> &: 0 b [total=88.6717 MB]
 vector<provider, ptr, false, strong<unsigned int, _provider_idx>> &: 2.29395 kb [total=88.6739 MB]
 vecvec<strong<unsigned int, _trip_direction_string>, vector<char, ptr>, vector<unsigned int, ptr>> &: 100.351 kb [total=88.7719 MB]
 vector<variant<strong<unsigned int, _location_idx>, strong<unsigned int, _trip_direction_string>>, ptr, false, strong<unsigned int, _trip_direction_idx>> &: 30.6016 kb [total=88.8018 MB]
 vecvec<strong<unsigned int, _trip_line_idx>, vector<char, ptr>, vector<unsigned int, ptr>> &: 6.21875 kb [total=88.8078 MB]
 vecvec<strong<unsigned int, _transport_idx>, vector<strong<unsigned int, _attribute_combination>, ptr>, vector<unsigned int, ptr>> &: 1.28299 MB [total=90.0908 MB]
 vecvec<strong<unsigned int, _transport_idx>, vector<strong<unsigned int, _provider_idx>, ptr>, vector<unsigned int, ptr>> &: 2.56598 MB [total=92.6568 MB]
 vecvec<strong<unsigned int, _transport_idx>, vector<strong<unsigned int, _trip_direction_idx>, ptr>, vector<unsigned int, ptr>> &: 5.13854 MB [total=97.7954 MB]
 vecvec<strong<unsigned int, _transport_idx>, vector<strong<unsigned int, _trip_line_idx>, ptr>, vector<unsigned int, ptr>> &: 5.13854 MB [total=102.934 MB]
 vecvec<strong<unsigned int, _transport_idx>, vector<route_color, ptr>, vector<unsigned int, ptr>> &: 8.99409 MB [total=111.928 MB]
 vecvec<strong<unsigned int, _location_idx>, vector<footpath, ptr>, vector<unsigned int, ptr>> &: 536.754 kb [total=112.452 MB]
 vecvec<strong<unsigned int, _location_idx>, vector<footpath, ptr>, vector<unsigned int, ptr>> &: 536.754 kb [total=112.976 MB]
 serializable_table<string< char *>, unsigned char, hash_all, equals_all, vector<std::pair<string< char *>, unsigned char>, ptr>, ankerl::unordered_dense::bucket_type::standard, false> &: 0 b [total=112.976 MB]
total: 112.976 MB

so clients can cache small regional data, and always be able to quickly download any additional regions if building longer routes there

It looks like you have very specific requirements that are not covered by nigiri. If you mean that the routing algorithm itself should trigger downloading new data on-the-fly as it touches new regions, you probably need to invent new algorithms for that. This requirement is definitely out-of-scope for nigiri.

Zverik commented 7 months ago

Regarding Nigiri, I think it packages footpath geometry alongside timetables, hence the big file size. In OM we already have pedestrian routing, so it would make sense to use it instead and save on storage. To compare, the original Berlin GTFS feed is 66 MB, and protobuf-compressed it's 5.5 MB.

It seems currently we have three options:

Use nigiri, which is written in C++ and has most of what we need. Maybe. It's not documented, so I don't know. The file format is a problem, but maybe it isn't — maybe we could feed it a compressed gtfs like with protobufs, and it'd expand them on device. Also not sure it could play well with pedestrian routing OM has.
This and the next one — download GTFS on a server and compress it to a CDN, from which OM would download feeds or updates. Would require implementing all the transit routing by ourselves, as opposed to Nigiri.

First option is gtfs-proto: 10x compression and support for deltas, meaning download sizes are low, but data can be kept up-to-date even when main maps are updated rarely. This would require a large internal database on devices that GTFS would be unpacked into. Other than that, this should play well with OM pedestrian routing and other things.
The same, but file format is packed and indexed in a way that doesn't require unpacking and keeping a local database — much like OM map files are built. I imagine indexes taking a good part of the file, plus navigating them might be slower than doing a database request, but not by much. Expecting a Berlin file to take up like 10 MB. Updates would be a problem: do we replaces parts of files? But again, we probably have partial updates for map files, why not apply the same to transit feed files.

And I agree with Felix that cross-feed routing is a bit out of scope for now, although probably can be done later using the same algorithms we use for subway routing.

felixguendling commented 7 months ago

I think it packages footpath geometry alongside timetables

For fast routing, there is no other option. If you switch to a dedicated foot routing engine to compute shortest paths inside the public transport routing, you will get routing times in the minutes or even hours on phones. As mentioned, the only alternative is the ULTRA approach mentioned here: https://github.com/organicmaps/organicmaps/issues/5331#issuecomment-2026893849 Otherwise, you have per round per station touched in this round in RAPTOR or per arrival one "one-to-all" routing query for the foot routing. These are at least thousands, sometimes millions of foot routing queries that cannot be sped up with goal direction (A*, etc.). I would definitely use preprocessed footpaths. Whether they come from your own foot routing or from the GTFS timetable itself doesn't matter.

biodranik commented 7 months ago

There is already some basic implementation of the mixed Public Transport routing in OM which uses foot paths + subway lines. Did anyone check it? A logical step could be to extend it with additional PT routes, in addition to subways, and then add support for time schedules.

An efficient OM-specific implementation is preferred over 3party, but less efficient implementation (more disk space, slower, restricted in functionality, etc.).

Triton171 commented 7 months ago

There is already some basic implementation of the mixed Public Transport routing in OM which uses foot paths + subway lines. Did anyone check it? A logical step could be to extend it with additional PT routes, in addition to subways, and then add support for time schedules.

The current implementation is essentially just the pedestrian router but with additional "fake" edges that model subway lines. In theory, one could generalize this to a time-dependent A*, but this has the issues that were mentioned above:

Dijkstra / A* only optimize arrival time, so they can't consider the number of transfers (other than indirectly if a large minimum change time is required). There are multi-criteria versions of Dijkstra but they're probably to inefficient for this use case.
Similarly, I don't know if there is an efficient way to support range queries using Dijkstra. This is a huge limitation, since most people would probably want to decide on an exact departure time based on the available connections.
Integrating time tables will come with a large performance penalty, since determining the weight of an edge is now a complicated operation where we have to find the first trip that can be reached given the current arrival time. I don't have any experience with public transit routing, so I don't know how bad the performance would be, but I think it would definitely be a problem.

Considering all of this, I'm not sure if this approach could provide a great user experience (and it almost surely wouldn't be competitive with Google Maps and the likes).

Regarding the size of the data format that nigiri uses, I have 2 points:

It looks like 30 MB of the timetable for Berlin are used for storing the arrival / departure times of the trips at each stop. Maybe this could be stored in a more compact way (this might make the routing algorithm less efficient though, so I'm not sure if it's worth the trade-off)? These suggestions are similar to what the GTFS format supports:
- Usually, the travel & stopping times are the same for all trips of one route (or there are only a few different variants, e.g. longer stopping times during rush hour). Therefore, it could save storage to store a list of departure times at the first stop and for all subsequent stops just store the duration it takes from the first stop.
- If I understand it correctly, the data format explicitly lists all stop times. For routes which are served with a regular frequency for some large interval, a more compact representation might make sense.
A simple solution for Organic Maps could also be to just compress the time tables (for downloading them and/or for storing them on the client). The timetable for Berlin takes ~112 MB uncompressed but just 23 MB after being compressed with ZSTD. After compression, this is significantly less than the size of the map (78 MB), so it's not a huge issue anymore in my opinion.

vng commented 7 months ago

Let's make the first version without timetables. Just to minimize time (distance) + minimize change penalty. Current A* implementation is quite good for that, because it works with dynamic weights.

biodranik commented 7 months ago

Plus alternate routes, of course. The next step would be to add schedules on top of that (if it is feasible), or in a completely different way.

Let's discuss any potential blockers for the schedule implementation on top of the current code in details.

Zverik commented 7 months ago

The timetable for Berlin takes ~112 MB uncompressed but just 23 MB after being compressed with ZSTD. After compression, this is significantly less than the size of the map (78 MB), so it's not a huge issue anymore in my opinion.

I should remind everyone that feeds generally work for a week in advance (it's mentioned somewhere in the spec). Files are often updated daily. So we're talking not of one-time download of 23 MB, like with maps, that are updated like once a month. It's 23 MB at least once a week, maybe automatically because users ignore red dots.

DavidKarlas commented 5 months ago

Just a random thought... Might be useful, might be noise... Maybe format of data that is downloaded could be size optimized(protobuf?) but only decompressed for needs of engine as needed, so for example when start/end points bounding box overlaps with bounding box of public transportation network? With speed of todays phone CPUs this decompression might be fast enough to not be noticeable...

Zverik commented 5 months ago

@DavidKarlas it was mentioned above: https://github.com/Zverik/gtfs-proto/

Korb commented 5 months ago

In the city where I currently live (St Petersburg, ruzzia), Organic Maps does not support routing using ground public transport. As I understand from other topics, in general the application is still almost devoid of this functionality. This is sad, because after Google left this country, it stopped accepting edits to update maps and ceased to correspond to reality. And local navigation providers (Yandex and 2Gis) cannot be considered safe even for law-abiding citizens. I think @biodranik understands what I mean.

I opened the online OSM editor, and I can confirm that public transport routes are present on the map: 2024-06-15_19-54-33

In the table "Every public transport system in the world (CC0)" on the sheet "Overground" the column "Tram Lines" is "1", and the remaining columns (Trolleybus Lines, "Bus Lines" and "Other types") are empty meanings.

I am not a programmer. Do I just need to subscribe to news from the development team, expecting that one day I will be able to plan routes around the city in real time again, or can I be of some help to you?