Closed kalon33 closed 1 month ago
before: match_duplicates
matches duplicates from different GTFS sources (but not within the same)
now:
merge_dupes_intra_src
= deduplicate within the same GTFS source (not possible before)merge_dupes_inter_src
= same as match_duplicates
(deduplicate trips from different GTFS sources)Additionally, we have Open Telemetry (OTEL) support (MOTIS issue 541) and have a metrics endpoint for Prometheus, so it's possible now to create Grafana dashboards to monitor matching quality of real-time feeds to GTFS static sources, plot query runtimes, etc.
Be aware that if you use OTEL, routing queries will be logged (which enables us to identify problematic queries, but might not be what you want regarding privacy).
Edit: To really make use of merge_dupes_intra_src
you would probably need to turn of fixing feeds with gtfs-tidy
. This makes sense especially for GTFS static feeds that also have one or more real-time feeds. With gtfs-tidy
trips are removed completely while MOTIS will only deactivate one trip and point its trip_id
to the remaining trip. With gtfs-tidy
you never know if the trip that will get real-time updates will be removed.
before:
match_duplicates
matches duplicates from different GTFS sources (but not within the same)now:
* `merge_dupes_intra_src` = deduplicate within the same GTFS source (not possible before) * `merge_dupes_inter_src` = same as `match_duplicates` (deduplicate trips from different GTFS sources)
Additionally, we have Open Telemetry (OTEL) support (MOTIS issue 541) and have a metrics endpoint for Prometheus, so it's possible now to create Grafana dashboards to monitor matching quality of real-time feeds to GTFS static sources, plot query runtimes, etc.
Be aware that if you use OTEL, routing queries will be logged (which enables us to identify problematic queries, but might not be what you want regarding privacy).
Edit: To really make use of
merge_dupes_intra_src
you would probably need to turn of fixing feeds withgtfs-tidy
. This makes sense especially for GTFS static feeds that also have one or more real-time feeds. Withgtfs-tidy
trips are removed completely while MOTIS will only deactivate one trip and point itstrip_id
to the remaining trip. Withgtfs-tidy
you never know if the trip that will get real-time updates will be removed.
Thanks for the details @felixguendling .
@felixguendling I think we can remove some of the gtfsclean options to make this work, should be simple
If MOTIS doesn't handle something like missing files or other stuff (which was handled by gtfs-tidy
), we can also make it more robust. For example the case of missing agencies is already handled (it creates an "UNKNOWN" agency) - so we could make the agency file optional (as an example).
I don't think there is a need to get rid of gtfsclean. It's nice for debugging to have known correct files that you can inspect in a text editor.
Activating merge_dupes_inter_src
would be nice as we have multiple feeds that are overlapping in the ones we use, and that sometimes prevent GTFS-RT data to be properly used, so yes that's nice to add this @jbruechert thanks :)
I already did that I think, is there anything missing?
It doesn't seem to be activated in the configuration file:
The default is that it's turned off: https://github.com/motis-project/motis/blob/0ba35bedb7f33db4bf250bdc417af94f99cc19f2/modules/nigiri/include/motis/nigiri/nigiri.h#L35-L36
I made the change here: #507
I haven't had time to deploy it to our test instance on europe.motis-project.de - so if you don't want any surprises I can give you green light after I tested it. If you're adventurous, you can just merge it and see what happens. I hope it should be fine.
Thanks, but it's already activated in this merge request, and I already tested it with a small subset of the feeds
It has duplicate merging parameters that should (to my understanding) fix real time use when GTFS feeds overlaps.
We could add use to the configuration once this PR would be merged.