organicmaps / organicmaps

🍃 Organic Maps is a free Android & iOS offline maps app for travelers, tourists, hikers, and cyclists. It uses crowd-sourced OpenStreetMap data and is developed with love by MapsWithMe (MapsMe) founders and our community. No ads, no tracking, no data collection, no crapware. Please donate to support the development!
https://organicmaps.app
Apache License 2.0
9.98k stars 957 forks source link

Implement public transport routing #5331

Open dikkechill opened 1 year ago

dikkechill commented 1 year ago

Please collect all relevant tasks to implement public transport routing here:

Public Transport Routing Vision

The current OrganicMaps architecture vision (from comment)

The main goal is to make a worldwide public transport routing better than Google. Public routing that is open-source and available to everyone. Development and support can be sponsored by cities and countries, by funds, and by the community.

Old PT approach

The previous approach of PT implementation was to embed GTFS feeds into offline maps data. It is not suitable for realtime updates and thus is not at the focus of this task.

Related issues

Improve UI

Other

Upvote & Fund

Fund with Polar

biodranik commented 1 year ago

@dikkechill I have updated the description. Feel free to create missing issues to track each subtask. Thank you for your help!

dikkechill commented 1 year ago

@dikkechill I have updated the description. Feel free to create missing issues to track each subtask. Thank you for your help!

Thank you for the update biodranik.

Do I understand it correctly you want to disregard further development on the experimental import of GTFS data into the .mwm files?

I actually saw implementing this experimental feature as a relatively straightforward first step to get public transport data in. Then OrganicMaps users can get something in the short term that's good, but not perfect.

In my mind the second step would then be to rethink the architecture for real-time data and make it closer to perfect. Something I expect to be a lot more challenging, both in implementation effort and in infrastructure maintenance. Which takes a lot longer to deliver.

By still implementing the experimental feature, the perfect doesn't become the enemy of the good. Even if this means duplicating some implementation effort. What are your thoughts?

biodranik commented 1 year ago

Our resources are very limited to work on something that will be dropped later. Embedding GTFS data in mwm files requires a lot of effort and introduces many potential support issues, while not solving the issue of PT schedule fluctuations.

How much client-side work will be required for the experimental approach?

Mapping GTFS data to OSM data is another big issue.

The common part in these approaches is collecting feeds from different sources. Btw, are there any other existing projects that already collect real-time feeds? Can we start from there?

In general, I don't mind if someone implements it. But IMO it won't be very usable with our monthly/weekly map data update cycles. People won't switch to Organic Maps from other, better aps for Public Transport planning.

jumelles commented 1 year ago

Btw, are there any other existing projects that already collect real-time feeds? Can we start from there?

https://database.mobilitydata.org/ -- This looks promising!

LaPingvino commented 1 year ago

I actually saw implementing this experimental feature as a relatively straightforward first step to get public transport data in. Then OrganicMaps users can get something in the short term that's good, but not perfect.

@dikkechill as another issue shows, preparing this demands a LOT of data (that issue said more than 35GB, and I think that was the data available by that user). If you ask me, a better approach would be supporting e.g. loading any specific gtfs file by needs of the user. Implement this manually first for a small and simple gtfs file, fix issues and add features, then map specific maps to relevant gtfs files for that area so they can be loaded in only when needed.

I think it's nice that there are huge collections of existing sources, but selecting which gtfs you want to use will give a huge added value, especially also for open source gtfs development for testing etc.

I can imagine that gtfs is very heavy to load in because it's a zip based format with text files. Another possible approach then is to design a format that is easy to work with and enable adding transport providers through this format. It shouldn't be too hard to write a converter between gtfs and such a format and if this reduces load on the app or e.g. enables selective loading where needed, that's probably worth it.

Happy to write any of these conversion tools etc. I already played a bit with tools dealing with gtfs and will probably write a good bunch more over time.

mil commented 9 months ago

I've been working on Mobroute, which is a FOSS GTFS router based in Go that is designed to be embeddable (as a library); and the public transit routing algorithm runs directly on phone (example app here). Mobroute ingests GTFS data from agencies (sourced from the Mobility Database); and the project aims to be an efficient & end-to-end solution (both GTFS ETL & algorithm implementation) for offline/privacy-respecting public transit routing. The codebase is designed to be minimalistic (~2k LOC ETL / ~2k LOC routing algorithm/api). I believe that Mobroute could solve many of OM's public transit design requirements (re: Clients build routes from this schedule info offline. & also being compatible with a good number of metros).

I am curious how open Organic Maps would be to a potential collaboration / integration with Mobroute? As a user of OM, I would very much be open to taking OM in as a supported target for Mobroute & taking a potential integration into design considerations for the library which I am working on stabilizing the API for over the next few months.

If OM contributors or maintainers have experience with OM's current routing system(s) & UI, I'd be curious to hear your thoughts (re: integration) & would be happy to field questions from my side.

Fale commented 9 months ago

There is a lot of movement on those kind of themes lately. An interesting one is also this.

biodranik commented 9 months ago

@mil that sounds interesting. Some questions:

  1. Are GTFS-based routes matched on the OSM data?
  2. What is the capacity of the GTFS (MobilityData?) server(s)? Will it survive requests from millions of users?
  3. Did you run it on iOS and Android?
  4. What algo is used?
  5. Does it support different types of transport?
omentic commented 9 months ago

@mil That looks like a really cool library. You might consider reaching out to the Transportr people (https://github.com/grote/Transportr/), development recently restarted after a couple of years, and most of their cities depend on now-defunct providers. Ah, looks like they're mentioned in the FOSDEM presentation linked above too. This specifically was what happened to them in the period of inactive development:

Navitia drops many coverage areas from their free service and SNCB discontinues their Hafas API, leaving the FOSS transport apps with several large gaps in the regions they can serve.

mil commented 9 months ago

@biodranik thanks for the questions - feedback below:

@mil that sounds interesting. Some questions:

1. Are GTFS-based routes matched on the OSM data?

2. What is the capacity of the GTFS (MobilityData?) server(s)? Will it survive requests from millions of users?

3. Did you run it on iOS and Android?

4. What algo is used?

5. Does it support different types of transport?

1) OSM data is not used at all. Mobroute is solely a GTFS router.

2) So the way Mobroute works is that GTFS feeds are fetched directly from agencies themselves. MobilityData distributes a CSV file listing thousands of upstream agencies GTFS URLs (and associated bboxes & metadata). Mobroute can ingest from these source agencies as indexed by MDBID (the ID # they indicate per GTFS/agency source) or from raw GTFS URLs. See here and the raw CSV link on that page for details on Mobility Database. Added a few more details below after questions section. There is no 'middleman' server with using MobilityData's CSV.

3) Testing was done on Linux & Android. Nothing in specific is platform-dependent (and go-mobile supports alot including iOS - binding is per-platform). The only real hard dependency is SQLite >=3.28.0 (this means Android >=11).

4) Algorithm used is an implementation of the Connection Scan Algorithm (CSA) - more details on underlying routing API around CSA is here. Among the popular public transit routing algorithms (RAPTOR, etc.) CSA is mainly known for being among the most efficient & simple.

5) Yup - GTFS supports different route_types in route.txt (tram, rail, bus).. there are many different types in the Mobility Database feeds.

The way I imagine an OM/Mobroute integration working would be we would select a few well-supported metros/MDBIDs to start.. and have Mobroute bake the GTFS (directly from agencies) into Mobroute's routing-optimized SQLite format (in OM's CI) to be bundled with MWM. Doing this in OM's CI would make it so source agencies GTFS archives are only requested once (per update cycle). Part of the reason why an integration with OM appeals to me is that the 'datamodel' with OM is similar to Mobroute in that the UI already has a 'download' step for the user (which is a necessary component for Mobroute).

@omentic thanks for mentioning - I've also been considering Transportr as well. I think there's a difference between the approach of 'routing ETL and algorithm as a library' (Mobroute) versus 'routing algorithm as an external HTTP server' (Navitia, OTP, GraphHopper, MOTIS). OM seems like a good fit for something like Mobroute / library approach with minimal UI changes (just bundling GTFS-derived data with MWM and running CSA algorithm on mobile); versus Transportr which may require heavier UI changes (as they rely on 'online' routing).

biodranik commented 9 months ago

Thanks for the answers.

  1. How will users see routes on the map without matching with OSM data? Won't it look ugly and unusable?
  2. Embedding frequently (even in real-time) changing schedules info into mwms may not be the best idea, and it doesn't allow other FOSS apps to use the same PT service.
  3. OM works on Android 5 and above. Jumping to Android 11 is too radical.
  4. Can you please provide some route examples produced by your lib with different transport types, from different GTFS sources? E.g. in Switzerland, including bus, tram, train, ferry, Zürich Polybahn, and Seilbahn Rigiblick? Or in some other country.
  5. Are there any other useful sources/feeds except GTFS? Any plans to support them?
mil commented 9 months ago

Thanks for the answers.

1. How will users see routes on the map without matching with OSM data? Won't it look ugly and unusable?

2. Embedding frequently (even in real-time) changing schedules info into mwms may not be the best idea, and it doesn't allow other FOSS apps to use the same PT service.

3. OM works on Android 5 and above. Jumping to Android 11 is too radical.

4. Can you please provide some route examples produced by your lib with different transport types, from different GTFS sources? E.g. in Switzerland, including bus, tram, train, ferry, Zürich Polybahn, and Seilbahn Rigiblick? Or in some other country.

5. Are there any other useful sources/feeds except GTFS? Any plans to support them?
  1. Providing OSM-to-GTFS stop matching etc. is a bit of a separate problem. Are you just wanting to address showing OSM metadata in the individual routepoints (stops) or something more? Having the route points rendered on the map wouldn't necessarily be unusable or ugly (imo) but it'd depend on UI implementation.

  2. Good point. The SQLite data for MR (per region) as a standalone (created in CI) works as well (not embedded in MWM); as a standalone other projects could utilize the same data.

  3. I suppose a recent SQLite version binary could ship with MR itself rather then using system SQLite (though tooling may be tricky).

  4. Sure - sample routes generated from tests (based on actual GTFS data from different providers & transport types) are at: http://ci.lrdu.org/tests_results/

  5. GTFS is the most widely used & accepted standard for PT data. I do plan for GTFS-RT support (realtime updates); GTFS-RT is protobuf that builds on GTFS static data & very small in comparison with GTFS. Other formats could be added in the future but are not on my roadmap currently.

biodranik commented 9 months ago
  1. It looks ugly without binding to OSM data. Are you aware of any project that tries to solve GTFS <=> OSM matching globally?
  2. Why SQLite? Which functionality of it is required for your implementation?
dikkechill commented 9 months ago
  1. It looks ugly without binding to OSM data. Are you aware of any project that tries to solve GTFS <=> OSM matching globally?

I'm aware of pfaedle for OSM - GTFS matching.

XioNoX commented 9 months ago

What about using the existing PT route relations present in OSM ? https://www.openstreetmap.org/#layers=T In many places they're more accurate than trying to route from stop to stop, and it would be an incentive (and reward) for people to add that data to OpenStreetMap.

michaelblyons commented 9 months ago

@XioNoX isn't that what people have been talking about with respect to matching? The link in the comment right above yours uses the OSM relationships linked to GTFS.

Triton171 commented 9 months ago

@mil How large is the SQLite database that one needs for routing typically? If I understand it correctly, it stores an entry for each direct connection (i.e. a service going at some point in time from one stop to the next on its route). For something like a frequent bus/metro line with a few dozen stops, it seems like this could easily add up to a few thousand entries in the DB. Optimizing the size of the timetable is probably pretty important considering that it'll be stored on a phone (with often very limited space) and ideally updated frequently.

mil commented 8 months ago

@biodranik GTFS spec provides shapes.txt which may address some aesthetic concerns. It's an optional in GTFS spec. Currently my routing system doesn't utilize shapes.txt at all but support wouldn't be hard to add in the future. I'm not well versed in the OSM-GTFS matching problem but pfaedle mentioned above does look interesting.

SQLite is used for storing the GTFS data & querying; SQLite is a natural fit as GTFS is modeled as relational tables. More detailed information on implementation & its functioning is available in docs.

@Triton171 the database size varies and is proportional to GTFS archive input size. Optimizing the size is indeed important; I hope to address estimations of required size constraints (per-region etc.) in future updates.

charlie2clarke commented 8 months ago

I'm interested to understand how I can be most helpful in contributing to the public transport functionality as I'd like to identify which work aspect would be best to focus on for a potential GSoC project. I am primarily interested in the server-side work, and having read related issues and the docs, my initial questions/thoughts are:

biodranik commented 8 months ago

@charlie2clarke thanks for your questions! Here is an updated overview of the PT task:

  1. There are separate parts of this complex PT feature. One is using OSM routes/stops data to show routes to users. Another is to prepare and fetch frequent schedule updates, match them with the existing OSM data, and use them for display in the Place Page and route calculations.
  2. Embedding frequently (often in real time!) updated schedule data into offline maps doesn't make much sense. That data should be generated/downloaded/cached/stored separately. An additional benefit of that approach is that any other app/project then can reuse it. This may motivate other contributors/app developers to help.
  3. The API to fetch schedules should be a simple HTTP GET for a file, so it can be automatically cached on any CDN to scale indefinitely.
  4. That means we need our (fast and efficient!) server code that will collect GTFS feeds from many sources, and pre-process them, preparing for clients to digest.
  5. While parsing GTFS directly on clients sounds "cool", that doesn't scale well, creates a lot of overhead, and may be very complex to match with the existing OSM data (considering it was already processed by OM map generator). Using a lightweight, simple, scalable and cacheable data format for schedules on clients looks like a better solution.
  6. That means we can efficiently process GTFS on a server. To avoid spending a lot of money on it, and on writing a horizontally scalable code, an efficient implementation in Rust or golang may handle all the job.
  7. The final goal is to process thousands of feeds from all over the world, as soon as they have been updated, to deliver the (as real-time as possible) updates to hundreds of millions of clients.
  8. As our mission is to build maps that are better than other maps, there are many other sub/side projects around PT: matching GTFS with OSM, properly updating OSM data, providing an easy way for businesses/transport companies or even for end users to update schedules, etc.

Please let me know what you (and anyone else) think about this approach.

charlie2clarke commented 8 months ago

Thanks for the clarification @biodranik, I think I would be best suited to looking into the preparing and fetching of schedule updates for the static and realtime PT data. Regarding the potential GTFS API:

I'm honestly not sure what the difference in cost would be as on the one hand the pull based model is simpler and would have fewer components, however, there would be many more API calls. The push model is more complex, but resource usage would scale well.

The cost of fetching realtime updates could be optimised slightly by only establishing connections when a stop is clicked on, a navigation is in progress, or even in the future, a user has a saved route so they could for example receive traffic updates for their commute.

In either case, given that performance is a high priority, maybe a gRPC API would be best?

dikkechill commented 7 months ago

@charlie2clarke Thanks for your thoughts on this! I opened #5332 specifically for the real time data architecture a while ago. The project I found most interesting is travic by @patrickbr (University of Freiburg). He's also the author of pfaedle. Maybe we can get more information on the travic operations, current setup (some details are in the 2014 paper) and lessons learned. Perhaps some form of cooperation is possible. @biodranik do you want to reach out 'officially' from organicmaps? I could also do this.

biodranik commented 7 months ago

Our strategic goal is to build a feature that will work comparable to or better than Google and Apple Maps public transport features. Let's design the feature and architecture from this point of view. In particular: building and displaying mixed routes on the map, including stop changes/transfers, and schedule info (if available).

Our use case should support both: routes from OSM with some kind of static schedules (to help people where GTFS feeds are not available), and with real-time info in other places.

There are different ways to implement it while keeping the offline-first in mind, for example:

  1. Clients download GTFS and process it locally, including matching with OSM data.
  2. GTFS and other formats are collected on the server and matched with OSM data there. Clients work with an efficient and specialized format (compact, easy to decode, stores only necessary data, easy to cache and use in offline).
  3. The same as (2), but clients work with GTFS.

IMO (2) is better suited for the strategic goal, but I'm open to discussion.

In the future, if the data is collected/processed on the server, any online API services are also possible.

Note that to reach the strategic goal, the architecture should be easily scalable and cheap for millions of users.

Which CDN to use should not matter at all. HTTP content is easily cacheable on edge servers.

Using pfaedle or any other tools to match/process raw data with OSM may be a good start. Doing it on the server is way easier than on clients.

Push or pull is a good question. AFAIK, to properly implement push, apps should work with Apple or Google notification systems. The pull model looks easier and more scalable for OM clients <=> OM PT Server. The push model is great for GTFS sources <=> OM PT Server interaction.

Regarding static GTFS data, does it make sense to build a validator for it, so this data can be directly stored/updated in OSM? Or would it be better to enrich OSM data with external sources?

charlie2clarke commented 7 months ago

Thanks @dikkechill, I'll have a read of that paper.

@biodranik Thinking more generally about differentiating OM from Google/Apple PT features:

IMO (2) is better suited for the strategic goal, but I'm open to discussion.

I agree that 2 is the best option. Regarding the data format for client-side (bear in mind I'm not experienced with geospatial data):

Regarding push/pull: I agree that the pull model looks a lot easier, my only concern would be that if there are a million users polling OM PT servers every 5 seconds for example, then this would cost more than OM PT servers pushing an update to online OM clients only once there is an update to send. I don't have experience at this scale though so I'm happy to be told otherwise. I don't imagine all GTFS source will support push based models so the GTFS sources <-> OM PT servers would have to use a mix.

Not 100% clear about the validation aspect, is my understanding correct: so the OM PT API would fetch GTFS feeds, these could then be validated using existing tools like gtfs-validator, the OM PT API could then do GTFS -> OSM matching, and then the OSM data could be validated again (or the decided on format suitable for clients) - does it need special validation before doing GTFS -> OSM matching? I notice that the subways-preprocessor seems to handle GTFS data - is this not in use at the moment?

I am quickly learning of the size and complexity of this work… I think it's good to talk about the big picture to get a better understanding of all the work aspects, but I am aware that in the short-term, I would like to scope my work so it's suitable for the GSoC period. However, I know that I would be continuing development on this feature beyond the GSoC end date.

Misalf-git commented 7 months ago
  • A not so related thought, but maybe a useful PT feature idea that I haven't seen before could be a 'time to leave' prompt. When you use PT and you aren't already on the move, you inevitably need to choose what time train/bus/tram/ferry you want to get and then you have to back-track yourself to work out what time you should leave (depending on how you're getting there) - shouldn't your maps app tell you when you need to?

Öffi does something similar. One can set a reminder for a selected departure time using the system's calendar app. IIRC it doesn't take into account the actual time required to get to the station, but an arbitrary "15 Minutes earlier" or something. Still a nice feature.

Triton171 commented 7 months ago

I'd also be interested in helping move forward this feature. It seems to me that the first step has to be deciding on a client-side data format and routing algorithm (since the data format depends on the algorithm). For the algorithm, I'm aware of a few alternatives:

All of these algorithms have the disadvantage of only working with precomputed transfers. This not only increases the size of the client-side data format but also makes hybrid routing handling (long) footpaths as well as public transit very difficult. To properly support this, I only know of the "naive" approach:

In my opinion, the simple Dijkstra approach is the most promising. It's more flexible and easier to implement than the alternatives. The larger query time compared to the other algorithms might not be a huge deal, since single-digit millisecond query times are not necessary anyways. The data format could also be pretty simple:

Sending real-time information for some routes should be fairly straightforward assuming that client & server have some shared route IDs. It would probably be nice to write down a specification for the data format (subject to change of course) somewhere, because this would allow us to implement the client & server independently of each other. I'm not too familiar with the Organic Maps codebase, so there are probably many things that I've overlooked and that more competent people can point out, so I'm happy about any remarks & corrections.

biodranik commented 7 months ago

Speaking about Google and Apple Maps, they are still not always great when building the fastest PT route with several transport type changes, even in areas with great data like Zürich.

Isn't "time to leave" already embedded in Apple and Google calendars? That would be a handy feature indeed.

AFAIK SQLite is used in mobile apps for two purposes:

  1. Save disk space when storing many small files (we likely need it too for Satellite imagery support).
  2. Querying DB/data in a way easier for developers. As you see, we are often more focused on "easier/better for users" than "easier for developers" in Organic Maps :) That leads to implementing our own data formats or algorithms that allow us to get a better performance or functionality.

I heard that pbf for GTFS is developed by @Zverik

"Storing only necessary data" means removing anything unnecessary to build the right routes.

We'll come back to the pull/push model later when other components are ready for integration.

does it need special validation before doing GTFS -> OSM matching

Maybe someone can already answer this question?

Regarding the subways-preprocessor seems to handle GTFS data, @alexey-zakharenkov how is it used now/planned to use? As a side question, it would be great to extend your subway validator to support buses and trams, is there a list of tasks/guidelines on how to achieve that?

The PT feature is indeed complex and may take a lot of time to implement. Doing a part of it would already be better than doing nothing ;-)


@Triton171 thanks for your detailed post! Would it be feasible to reuse our current modified A* algo on the PT data with minimum modifications, so that when we add alternate route support #1183 they will also work for the PT out of the box?

alexey-zakharenkov commented 7 months ago

Regarding the subways-preprocessor seems to handle GTFS data, @alexey-zakharenkov how is it used now/planned to use?

If --output-gtfs option is given, the preprocessor generates GTFS with input metro networks. It cannot read GTFS.

felixguendling commented 7 months ago

Hi everyone! :wave:

I'm maintaining MOTIS and public transit routing is my main topic since more than ten years now (including real-time, special cases like park&ride, etc.). In MOTIS, we have CSA, RAPTOR and TripBased and a time-dependent Multi-Criteria Dijkstra (MCD) implemented so we can compare them.

One thing that's missing in this discussion and making the comparison between Dijkstra/A and CSA/RAPTOR/TB unfair is "multi-criteria". In public transport (as opposed to street routing) we're usually not only interested in the fastest journey. Other criteria like the number of transfers are very important, too. Consider this example: connection A takes 1h with 2 transfers, connection B takes 1h 10min with 1 transfer. My guess is that most people would prefer connection B as you save one transfer with only 10min extra travel time. However, since we don't know the exact weighting of a transfer for every single user, the best approach is to compute the Pareto front (or Pareto set) of all optimal trade-offs. RAPTOR and TripBased produce a Pareto set but a Dijkstra/A would not. So comparing performance of single-criteria algorithms to multi-criteria algorithms would be an apple to oranges comparison as they don't solve the same problem. With the goal to produce high quality results comparable to Google/Apple maps, some kind of multi-criteria is a must in my opinion.

Another point I would like to bring into the discussion is the range-query topic and earlier+later search. In research as well as in practice there are two types of queries: earliest arrival query (or giving an arrival time: latest departure, so you need to search backward in time from the destination) and range-query (also forward+backward). A range query gives you an overview of the best connections in an departure/arrival interval. All mentioned algorithms (CSA, RAPTOR, TB) have also range/profile variants. To produce a similar user experience to what other systems (like Google Maps, HAFAS, etc.) offer, the range/profile variant with earlier+later search functionality would be required (in both directions, so the user can specify departure or arrival time).

Regarding RAPTOR performance:

The query time is significantly larger than for CSA

I don't think this statement is true in general. CSA looks at all connections as it scans through time, no matter where the connection departs. With our implementation in MOTIS, we have the experience that CSA becomes slow as the area grows. RAPTOR only looks at connections departing at stations that were marked in the previous round. CSA will still scan the connections in London, even if you are routing between two locations in Madrid.

Regarding TripBased:

Essentially just Dijkstra's algorithm but on the graph whose vertices are trips and whose edges are transfers between trips

I would say it's more like round-based BFS and vertices are trip segments.

makes hybrid routing handling (long) footpaths as well as public transit very difficult

It depends: state of the art routing engines like MOTIS do not have any issues handling long footpaths as first or last leg as this can be handled outside the RAPTOR/CSA/TB algorithm. If long footpaths between two public transport legs are required, you can look at the ULTRA approach [^1][^2].


In MOTIS, we have a separate routing engine for public transport: https://github.com/motis-project/nigiri (basic algorithm is the RAPTOR but with some adjustments, timetable uses bitfield encoding for traffic days to be more compact - to the best of my knowledge, nigiri is currently the fastest publicly available multi-criteria transit routing engine with the most compact serializable data model)

nigiri can be used independent of MOTIS and could be linked as library.

The approach to do intermodal door-to-door routing is to compute offsets by routing between the start location and close-by public transport stations for all modes in an external street routing (same for the destination). Then those start offsets are used to initialize the search. The destination offsets can be seen as additional ad-hoc edges considered only for this particular routing query.

nigiri supports all of the aforementioned routing features such as multi-criteria routing, backwards-search, range query with earlier/later search functionality, etc. (except ULTRA). It also has real-time update support and has a very compact data model that can be dumped to disk (or sent via network). It can parse GTFS ZIP files and process GTFS-RT protobuf updates.

What's currently missing in nigiri is the handling of GTFS shapes.txt. This can either be done external or it can be added to nigiri.

[^1]: Sauer, Jonas, Dorothea Wagner, and Tobias Zündorf. "Integrating ULTRA and Trip-Based Routing." 20th Symposium on Algorithmic Approaches for Transportation Modelling, Optimization, and Systems (ATMOS 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020. [PDF] [^2]: Baum, Moritz, et al. "UnLimited TRAnsfers for Multi-Modal Route Planning: An Efficient Solution." [PDF]

charlie2clarke commented 7 months ago

@biodranik When I have the time, I shall test out some potential client-side PT formats: some vector based SQLite implementation, and PBF. I think the trade-off OM makes of increasing performance at the sacrifice of developer ease is worthwhile, but its useful to keep in mind that the satellite support may require SQLite as the PT stuff could lean into this.

For myself, I think for the scope of GSoC I should focus on the PT API(s). One of elements to this which I would like to workout more is how the API can be queried/called. My initial thought on this is that I believe it makes a difference for the API if the OM client is expecting to render live PT vehicle positions - do you know if this is anticipated to be supported? IMHO, I don't think OM needs this as it will clutter a phone's screen, not to mention the high memory consumption. However, I think supporting route progress would be good, i.e., showing where the train/bus/tram is relative to each stop on its route.


@felixguendling Really cool project to see! I think fine tuning the PT routing algorithms would certainly be desirable, though as I understand it, there are a few dependencies to resolve before looking at the routing algorithms, like the data model. Does nigri have any specific data requirements for integration? Is there anywhere I can find some more info on the data model?

biodranik commented 7 months ago
  1. The task of efficient collecting and (pre)processing of GTFS data on a server can be started independently.
  2. At the same time, the most efficient data format for clients can be designed. Ideally, it should be compact, cacheable on CDN, and easily readable by algorithms on the client.
  3. If we speak about many small files on clients, then some workarounds like SQLite may make sense, even for binary blobs. There is a limit to max opened files at the same time, 256 on iOS and 1024 on Android, and it is already taken by opened map files. On the other hand, if each downloaded file is very compact, can be opened, loaded into the memory/decoded, and then closed, SQLite workarounds may not be necessary.

Regarding routing algo, yes, it should not limit the expected user features. Let's list them all?

felixguendling commented 7 months ago

Does nigri have any specific data requirements for integration?

I'm not sure if I understand your question. As mentioned, nigiri uses GTFS and optionally GTFS-RT as input. It can load multiple GTFS datasets and update them with multiple GTFS-RT feeds.

An example can be seen at transitous.org or europe.motis-project.de.

Is there anywhere I can find some more info on the data model?

The basic data model is the RAPTOR data model with some adjustments to enable encoding traffic days with bitfields. So the RAPTOR paper is probably the best entry point to understand the data model.

Edit: data is stored in cista data structures to make it easily (de-)serializable.

focus on the PT API(s)

Maybe the transitous API would be a good starting point? I think the goals (privacy friendly, user oriented, etc.) overlap a lot. https://github.com/public-transport/transitous

If we speak about many small files on clients

In case of nigiri, it creates one timetable.bin file from all input timetables loading one after another. This could be done on the server or the client. So open file descriptor limits should not be an issue.

Zverik commented 7 months ago

I'll add one missing piece of information that's crucial for OM's public transport support. It's the gtfs-proto format implemented in https://github.com/Zverik/gtfs-proto/

The idea was, we set up a CDN for those files, just like we do for offline maps rn. Some server-side processing converts GTFS feeds daily, and also calculates deltas (diffs) between the latest feed and a selection of old feeds. So, the servier-side is akin to the current OSM data processing.

On the client side, we will have an internal sqlite storage for PT data, and a downloading panel that looks basically the same as the current map downloads. (Or we could package PT data inside the map files?). And there should be an update mechanism for PT data from gtfs-proto deltas, which downloads like ~100kb of data and updates the internal sqlite database.

The gtfs-proto format is missing the fare information, but already packages the entire network complete with the geometry. And it's 10x smaller than original GTFS feeds, which goes in line with OM core principles, keeping downloads small.

felixguendling commented 7 months ago

nigiri already has an architecture to support multiple data formats (currently, we have implementations for GTFS and HAFAS Rohdaten): https://github.com/motis-project/nigiri/tree/master/src/loader

Adding parser code for another format would probably only be a few hundred lines of code more as most of the code could be shared with the existing GTFS implementation.

biodranik commented 7 months ago

Does GTFS already support routes that may be different at different time of the day?

Does OpenStreetMap.org mapping scheme support it too?

a letter on a screenshot means a slightly different route.

image

charlie2clarke commented 7 months ago

Does GTFS already support routes that may be different at different time of the day?

@biodranik Yes it does. As far as I understand it, there are a few notable points about GTFS Schedules relevant here:

You can see more info on GTFS Schedule data here if interested.

Does OpenStreetMap.org mapping scheme support it too?

It doesn't appear to no. You can see a few OSM proposals below regarding public transport schedules, each of which were rejected on the basis that this would be unmaintainable given the variability of services. https://wiki.openstreetmap.org/wiki/Proposal:Public_transport_schedules/Timetable_relations https://wiki.openstreetmap.org/wiki/Proposal:Public_transport_schedules/Departures

Triton171 commented 7 months ago

Thanks a lot for the clarifications regarding different PT routing algorithms @felixguendling. In my opinion, using nigiri as a routing library for Organic Maps seems like a really good idea, since it does pretty much exactly what we need. Of course the decision whether it's worth the extra dependencies is up to the maintainers. There are a few questions & issues we probably need to resolve before being able to integrate nigiri as a routing engine into Organic Maps:

felixguendling commented 7 months ago

client-side format of nigiri

There's not really a client side and server side format as nigiri is currently only used inside MOTIS which is used as a server-side application. However, nigiri is supposed to be embeddable as a library in any other application (or scripting language via bindings).

metro region like Berlin

For 365 days from now it has 114M, for the next two days it has 55M. You can try it out with this setup script:

#!/bin/sh

# TARGET="macos-x86_64"
# TARGET="macos-x86_64-noavx2"
# TARGET="macos-arm64"
TARGET="linux-amd64"
# TARGET="linux-amd64-noavx"
# TARGET="linux-arm64"

wget https://github.com/motis-project/motis/releases/latest/download/motis-${TARGET}.tar.bz2
tar xf motis-${TARGET}.tar.bz2

wget https://www.vbb.de/vbbgtfs -O vbb.zip

cat <<EOT >> config.ini
modules=intermodal
modules=nigiri

intermodal.router=nigiri
server.static_path=motis/web
dataset.no_schedule=true
import.paths=schedule-vbb:vbb.zip
tiles.profile=motis/tiles-profiles/background.lua
nigiri.num_days=365
EOT

./motis/motis

The files are placed in data/nigiri folder.

You can also try it with different other GTFS download links.

This is a stripped down version of the setup and installation guide for MOTIS. If you want to use the UI, you need to enable more modules.

Would it make sense to split this up

I guess that's easy at least for the GTFS-RT updating part.

Since the routing algorithm itself handles rt_timetable and timetable or only timetable based on template arguments and is header-only, the client (in this case OM) would just not invoke it with the Rt template flag set and therefore, no code for this scenario would be generated.

However, I could imagine scenarios where it makes sense to have real-time updates directly in the client.

Is there a nice way to handle overlapping timetables?

Yes, this is already done frequently. One example is the transitous project where the long distance trains are contained in several of the loaded datasets (which is a good thing! better than not having them at all).

For this case, nigiri

When duplicates are detected, they are merged together. This means that there's only one instance left that is linked to both trip_ids from both datasets, so real-time updates can be handled from both datasets. In theory, this can also be used to load the German Delfi dataset covering whole Germany (with no real-time updates so far) as well as regional GTFS-static datasets where GTFS-RT updates are available (but it hasn't been tried in practice yet).

adding support for this upstream makes the most sense

I agree. My preferred option would be to have them on-disk (to keep the memory footprint low as only routing data needs to be held in RAM) and memory map.

biodranik commented 7 months ago

For 365 days from now it has 114M, for the next two days it has 55M.

Wow, why is it so big?

It would be great to do all "dirty matching work" and preprocessing on the server, so clients can cache small regional data, and always be able to quickly download any additional regions if building longer routes there.

Note that ideal routes should be drawn on top of existing OSM roads/paths. Not ideal solution will be only a temporary goal.

felixguendling commented 7 months ago

Wow, why is it so big?

I would say this is very small (at least if you compare it to other solutions that have similar capabilities, such as OTP2). It's a buffer of memory that could be memory-mapped and is directly routeable.

Details here:

 timetable::locations &: 19.34 MB [total=19.3456 MB]
 interval<std::chrono::time_point<std::chrono::system_clock, std::chrono::duration<int, std::ratio<86400>>>> &: 0 b [total=19.3456 MB]
 vector<pair<strong<unsigned int, _trip_id_str_idx>, strong<unsigned int, _trip_idx>>, ptr> &: 1.94064 MB [total=21.2862 MB]
 dynamic_fws_multimap_base<strong<unsigned int, _trip_id_str_idx>, strong<unsigned int, _trip_idx>, mutable_multimap_helper<strong<unsigned int, _trip_idx>, strong<unsigned int, _trip_id_str_idx>>::vec> &: 3.88129 MB [total=25.1675 MB]
 vecvec<strong<unsigned int, _trip_id_str_idx>, vector<char, ptr>, vector<unsigned int, ptr>> &: 3.15355 MB [total=28.3211 MB]
 vector<strong<unsigned short, _source_idx>, ptr, false, strong<unsigned int, _trip_id_str_idx>> &: 496.805 kb [total=28.8062 MB]
 vector<unsigned int, ptr, false, strong<unsigned int, _trip_id_str_idx>> &: 993.609 kb [total=29.7766 MB]
 vecvec<strong<unsigned int, _trip_idx>, vector<pair<strong<unsigned int, _transport_idx>, interval<unsigned short>>, ptr>, vector<unsigned int, ptr>> &: 3.7044 MB [total=33.481 MB]
 vecvec<strong<unsigned int, _trip_idx>, vector<unsigned short, ptr>, vector<unsigned int, ptr>> &: 993.613 kb [total=34.4513 MB]
 dynamic_fws_multimap_base<trip_debug, strong<unsigned int, _trip_idx>, mutable_multimap_helper<strong<unsigned int, _trip_idx>, trip_debug>::vec> &: 5.82193 MB [total=40.2732 MB]
 vecvec<strong<unsigned short, _source_file_idx>, vector<char, ptr>, vector<unsigned short, ptr>> &: 26 b [total=40.2732 MB]
 vecvec<strong<unsigned int, _trip_idx>, vector<char, ptr>, vector<unsigned int, ptr>> &: 2.50873 MB [total=42.782 MB]
 vector<interval<strong<unsigned int, _transport_idx>>, ptr, false, strong<unsigned int, _route_idx>> &: 148.25 kb [total=42.9267 MB]
 vecvec<strong<unsigned int, _route_idx>, vector<unsigned int, ptr>, vector<unsigned int, ptr>> &: 1.88833 MB [total=44.8151 MB]
 vector<clasz, ptr, false, strong<unsigned int, _route_idx>> &: 18.5312 kb [total=44.8332 MB]
 vecvec<strong<unsigned int, _route_idx>, vector<clasz, ptr>, vector<unsigned int, ptr>> &: 92.6602 kb [total=44.9237 MB]
 vecvec<strong<unsigned int, _location_idx>, vector<strong<unsigned int, _route_idx>, ptr>, vector<unsigned int, ptr>> &: 1.70512 MB [total=46.6288 MB]
 vector<interval<unsigned int>, ptr, false, strong<unsigned int, _route_idx>> &: 148.25 kb [total=46.7736 MB]
 vector<delta, ptr> &: 30.6648 MB [total=77.4383 MB]
 vector<std::chrono::duration<short, std::ratio<60>>, ptr, false, strong<unsigned int, _transport_idx>> &: 656.889 kb [total=78.0798 MB]
 vector<unsigned char, ptr, false, strong<unsigned int, _transport_idx>> &: 0 b [total=78.0798 MB]
 vector<strong<unsigned int, _bitfield_idx>, ptr, false, strong<unsigned int, _transport_idx>> &: 1.28299 MB [total=79.3628 MB]
 vector<bitset<512>, ptr, false, strong<unsigned int, _bitfield_idx>> &: 156.938 kb [total=79.5161 MB]
 vector<strong<unsigned int, _route_idx>, ptr, false, strong<unsigned int, _transport_idx>> &: 1.28299 MB [total=80.799 MB]
 vecvec<strong<unsigned int, _transport_idx>, vector<strong<unsigned int, _merged_trips_idx>, ptr>, vector<unsigned int, ptr>> &: 5.13854 MB [total=85.9376 MB]
 vecvec<strong<unsigned int, _merged_trips_idx>, vector<strong<unsigned int, _trip_idx>, ptr>, vector<unsigned int, ptr>> &: 2.73407 MB [total=88.6717 MB]
 vector<attribute, ptr, false, strong<unsigned int, _attribute_idx>> &: 0 b [total=88.6717 MB]
 vecvec<strong<unsigned int, _attribute_combination>, vector<strong<unsigned int, _attribute_idx>, ptr>, vector<unsigned int, ptr>> &: 0 b [total=88.6717 MB]
 vector<provider, ptr, false, strong<unsigned int, _provider_idx>> &: 2.29395 kb [total=88.6739 MB]
 vecvec<strong<unsigned int, _trip_direction_string>, vector<char, ptr>, vector<unsigned int, ptr>> &: 100.351 kb [total=88.7719 MB]
 vector<variant<strong<unsigned int, _location_idx>, strong<unsigned int, _trip_direction_string>>, ptr, false, strong<unsigned int, _trip_direction_idx>> &: 30.6016 kb [total=88.8018 MB]
 vecvec<strong<unsigned int, _trip_line_idx>, vector<char, ptr>, vector<unsigned int, ptr>> &: 6.21875 kb [total=88.8078 MB]
 vecvec<strong<unsigned int, _transport_idx>, vector<strong<unsigned int, _attribute_combination>, ptr>, vector<unsigned int, ptr>> &: 1.28299 MB [total=90.0908 MB]
 vecvec<strong<unsigned int, _transport_idx>, vector<strong<unsigned int, _provider_idx>, ptr>, vector<unsigned int, ptr>> &: 2.56598 MB [total=92.6568 MB]
 vecvec<strong<unsigned int, _transport_idx>, vector<strong<unsigned int, _trip_direction_idx>, ptr>, vector<unsigned int, ptr>> &: 5.13854 MB [total=97.7954 MB]
 vecvec<strong<unsigned int, _transport_idx>, vector<strong<unsigned int, _trip_line_idx>, ptr>, vector<unsigned int, ptr>> &: 5.13854 MB [total=102.934 MB]
 vecvec<strong<unsigned int, _transport_idx>, vector<route_color, ptr>, vector<unsigned int, ptr>> &: 8.99409 MB [total=111.928 MB]
 vecvec<strong<unsigned int, _location_idx>, vector<footpath, ptr>, vector<unsigned int, ptr>> &: 536.754 kb [total=112.452 MB]
 vecvec<strong<unsigned int, _location_idx>, vector<footpath, ptr>, vector<unsigned int, ptr>> &: 536.754 kb [total=112.976 MB]
 serializable_table<string< char *>, unsigned char, hash_all, equals_all, vector<std::pair<string< char *>, unsigned char>, ptr>, ankerl::unordered_dense::bucket_type::standard, false> &: 0 b [total=112.976 MB]
total: 112.976 MB

so clients can cache small regional data, and always be able to quickly download any additional regions if building longer routes there

It looks like you have very specific requirements that are not covered by nigiri. If you mean that the routing algorithm itself should trigger downloading new data on-the-fly as it touches new regions, you probably need to invent new algorithms for that. This requirement is definitely out-of-scope for nigiri.

Zverik commented 7 months ago

Regarding Nigiri, I think it packages footpath geometry alongside timetables, hence the big file size. In OM we already have pedestrian routing, so it would make sense to use it instead and save on storage. To compare, the original Berlin GTFS feed is 66 MB, and protobuf-compressed it's 5.5 MB.

It seems currently we have three options:

  1. Use nigiri, which is written in C++ and has most of what we need. Maybe. It's not documented, so I don't know. The file format is a problem, but maybe it isn't — maybe we could feed it a compressed gtfs like with protobufs, and it'd expand them on device. Also not sure it could play well with pedestrian routing OM has.
  2. This and the next one — download GTFS on a server and compress it to a CDN, from which OM would download feeds or updates. Would require implementing all the transit routing by ourselves, as opposed to Nigiri.

    First option is gtfs-proto: 10x compression and support for deltas, meaning download sizes are low, but data can be kept up-to-date even when main maps are updated rarely. This would require a large internal database on devices that GTFS would be unpacked into. Other than that, this should play well with OM pedestrian routing and other things.

  3. The same, but file format is packed and indexed in a way that doesn't require unpacking and keeping a local database — much like OM map files are built. I imagine indexes taking a good part of the file, plus navigating them might be slower than doing a database request, but not by much. Expecting a Berlin file to take up like 10 MB. Updates would be a problem: do we replaces parts of files? But again, we probably have partial updates for map files, why not apply the same to transit feed files.

And I agree with Felix that cross-feed routing is a bit out of scope for now, although probably can be done later using the same algorithms we use for subway routing.

felixguendling commented 7 months ago

I think it packages footpath geometry alongside timetables

For fast routing, there is no other option. If you switch to a dedicated foot routing engine to compute shortest paths inside the public transport routing, you will get routing times in the minutes or even hours on phones. As mentioned, the only alternative is the ULTRA approach mentioned here: https://github.com/organicmaps/organicmaps/issues/5331#issuecomment-2026893849 Otherwise, you have per round per station touched in this round in RAPTOR or per arrival one "one-to-all" routing query for the foot routing. These are at least thousands, sometimes millions of foot routing queries that cannot be sped up with goal direction (A*, etc.). I would definitely use preprocessed footpaths. Whether they come from your own foot routing or from the GTFS timetable itself doesn't matter.

biodranik commented 7 months ago

There is already some basic implementation of the mixed Public Transport routing in OM which uses foot paths + subway lines. Did anyone check it? A logical step could be to extend it with additional PT routes, in addition to subways, and then add support for time schedules.

An efficient OM-specific implementation is preferred over 3party, but less efficient implementation (more disk space, slower, restricted in functionality, etc.).

Triton171 commented 7 months ago

There is already some basic implementation of the mixed Public Transport routing in OM which uses foot paths + subway lines. Did anyone check it? A logical step could be to extend it with additional PT routes, in addition to subways, and then add support for time schedules.

The current implementation is essentially just the pedestrian router but with additional "fake" edges that model subway lines. In theory, one could generalize this to a time-dependent A*, but this has the issues that were mentioned above:

Considering all of this, I'm not sure if this approach could provide a great user experience (and it almost surely wouldn't be competitive with Google Maps and the likes).

Regarding the size of the data format that nigiri uses, I have 2 points:

vng commented 7 months ago

Let's make the first version without timetables. Just to minimize time (distance) + minimize change penalty. Current A* implementation is quite good for that, because it works with dynamic weights.

biodranik commented 7 months ago

Plus alternate routes, of course. The next step would be to add schedules on top of that (if it is feasible), or in a completely different way.

Let's discuss any potential blockers for the schedule implementation on top of the current code in details.

Zverik commented 7 months ago

The timetable for Berlin takes ~112 MB uncompressed but just 23 MB after being compressed with ZSTD. After compression, this is significantly less than the size of the map (78 MB), so it's not a huge issue anymore in my opinion.

I should remind everyone that feeds generally work for a week in advance (it's mentioned somewhere in the spec). Files are often updated daily. So we're talking not of one-time download of 23 MB, like with maps, that are updated like once a month. It's 23 MB at least once a week, maybe automatically because users ignore red dots.

DavidKarlas commented 5 months ago

Just a random thought... Might be useful, might be noise... Maybe format of data that is downloaded could be size optimized(protobuf?) but only decompressed for needs of engine as needed, so for example when start/end points bounding box overlaps with bounding box of public transportation network? With speed of todays phone CPUs this decompression might be fast enough to not be noticeable...

Zverik commented 5 months ago

@DavidKarlas it was mentioned above: https://github.com/Zverik/gtfs-proto/

Korb commented 5 months ago

In the city where I currently live (St Petersburg, ruzzia), Organic Maps does not support routing using ground public transport. As I understand from other topics, in general the application is still almost devoid of this functionality. This is sad, because after Google left this country, it stopped accepting edits to update maps and ceased to correspond to reality. And local navigation providers (Yandex and 2Gis) cannot be considered safe even for law-abiding citizens. I think @biodranik understands what I mean.

I opened the online OSM editor, and I can confirm that public transport routes are present on the map: 2024-06-15_19-54-33

In the table "Every public transport system in the world (CC0)" on the sheet "Overground" the column "Tram Lines" is "1", and the remaining columns (Trolleybus Lines, "Bus Lines" and "Other types") are empty meanings.

I am not a programmer. Do I just need to subscribe to news from the development team, expecting that one day I will be able to plan routes around the city in real time again, or can I be of some help to you?