public-transport / transport.rest

Information about the *.transport.rest APIs.
https://transport.rest/
34 stars 4 forks source link

Validation of HVV GTFS-RT reveals missing id in VehicleDescriptor #5

Open dancesWithCycles opened 3 years ago

dancesWithCycles commented 3 years ago

Hi folks, Thank you so much for providing and maintaining this repository. Please accept my appreciation! I am evaluating the GTFS-RT feed for HVV with my own JavaScript command line tool. In the following example output you can see that the 'id' field in the 'VehicleDescriptor' field of the 'VehiclePosition' entity is not set.

  gtfs-rt-get2mongo vehicle position +1ms
  gtfs-rt-get2mongo TripDescriptor available +0ms
  gtfs-rt-get2mongo trip_id:25128492 +0ms
  gtfs-rt-get2mongo route_id:8850_3 +0ms
  gtfs-rt-get2mongo direction_id unavailable +0ms
  gtfs-rt-get2mongo start_time:15:17:00 +0ms
  gtfs-rt-get2mongo start_date:20210330 +0ms
  gtfs-rt-get2mongo VehicleDescriptor available +0ms
  gtfs-rt-get2mongo id unavailable +0ms
  gtfs-rt-get2mongo label: U Wandsbek Markt [Ankunft] +0ms
  gtfs-rt-get2mongo licensePlate unavailable +0ms
  gtfs-rt-get2mongo latitude: 53.57102584838867 +0ms
  gtfs-rt-get2mongo longitude: 10.116154670715332 +0ms
  gtfs-rt-get2mongo timestamp: 0 +0ms
  gtfs-rt-get2mongo vehicle position +0ms

Would you say this is normal and state-of-the-art? It is the first time I am consuming a real-time feed wit this field being unavailable. I am also wondering how I can tell vehicles apart when I store them in a database that does not allow duplicates. I appreciate any hints pointing me in the right direction. Cheers!

dancesWithCycles commented 3 years ago

BTW, I observed the same for the BVV feed as you see in the following example output.

  gtfs-rt-get2mongo vehicle position +0ms
  gtfs-rt-get2mongo TripDescriptor available +0ms
  gtfs-rt-get2mongo trip_id:153437345 +0ms
  gtfs-rt-get2mongo route_id:17456_700 +0ms
  gtfs-rt-get2mongo direction_id unavailable +0ms
  gtfs-rt-get2mongo start_time:10:51:00 +0ms
  gtfs-rt-get2mongo start_date:20210330 +0ms
  gtfs-rt-get2mongo VehicleDescriptor available +0ms
  gtfs-rt-get2mongo id unavailable +0ms
  gtfs-rt-get2mongo label: S+U Zoologischer Garten +0ms
  gtfs-rt-get2mongo licensePlate unavailable +0ms
  gtfs-rt-get2mongo latitude: 52.482269287109375 +0ms
  gtfs-rt-get2mongo longitude: 13.347678184509277 +0ms
  gtfs-rt-get2mongo timestamp: 0 +0ms
derhuerst commented 3 years ago

Thank you so much for providing and maintaining this repository. Please accept my appreciation!

Thank you!

I am evaluating the GTFS-RT feed for HVV [...]. [...] that the 'id' field in the 'VehicleDescriptor' field of the 'VehiclePosition' entity is not set.

Yes, unfortunately that's the case for now. gtfs-rt-inspector will show you that a lot of trips & vehicles are missing data.

Why some trips/vehicles have IDs that don't match the GTFS:

v0.hamburg-gtfs-rt.transport.rest uses hafas-gtfs-rt-feed underneath, which in turn uses match-gtfs-rt-to-gtfs. Fuzzy-matching HAFAS data against GTFS fails in some cases, because the two data sources use different naming for stops, lines, etc. Also, hafas-gtfs-rt-feed will only try to match HAFAS data against GTFS data and give up after some time, adding the HAFAS IDs to the GTFS-RT feed.

Why VehiclePosition.vehicle.id never as an ID:

Because it uses hafas-client's line.vehicleId field, which apparently is missing. Haven't checked why yet.

Would you say this is normal and state-of-the-art?

No, definitely not. These feeds are experimental, and serve two purposes:

  1. They intend to make a best-effort half-broken GTFS RT feed available where otherwise there are just siginificantly less usable or comfortable options are available.
  2. They are a proof of concept that building a GTFS RT feed is possible anyways by polling their HAFAS, and that there's an open & unrestricted feed out there now, so they can just help everyone by providing a proper GTFS RT feed themselves.
dancesWithCycles commented 3 years ago

Hi @derhuerst , Thank you very much for clarification. I had the idea to use the API and consume it with the Dede real-time map. However, the underlying database accepts only unique entries at the moment. Would you say I have to adapt the database to use this API or do you have any idea to get any unique identifier out of the feed data? Cheers!

derhuerst commented 3 years ago

However, the underlying database accepts only unique entries at the moment. Would you say I have to adapt the database to use this API or do you have any idea to get any unique identifier out of the feed data? Cheers!

Unique identifier for what? A physical vehicle (e.g. a specific bus with a specific license plate)? A trip (a specific vehicle running along a route at a specific point in time)? An arrival/departure at a stop?

dancesWithCycles commented 3 years ago

Unique identifier for what?

Good question. Indeed, I do not need an identifier to present it on the Dede real-time map. However, after I have consumed any GTFS-RT feed on my server, I store an entity for every vehicle in a database. The front end for the Dede real-time map uses this database to fetch vehicle entities that are presented on the map.

Here is the deal: With the current implementation of the database, only unique entities are allowed. Therefore, I am using the VehiclePosition -> VehicleDescriptor -> id field as a unique identifier for vehicle entities in my database.

I could theoretically change my database. However, I like to have unique entities and would rather figure out a way on how to extract a unique information from the real-time feed that I can use to make my vehicle entities in the database unique.

Do you know what I mean? How do you tell vehicles apart when they do not have some sort of unique information?

The frond end technology that I have worked with so far (e.g. react) always asks that any field/array/list data has a unique identifier like an index. That means, I did not find a way so far to present vehicles without them having unique identifies. What is your experience?

Cheers!

derhuerst commented 3 years ago

Unique identifier for what?

Good question. Indeed, I do not need an identifier to present it on the Dede real-time map.

What is "it" here? A vehicle?

If you want to show vehicles instead of trips (keep in mind that often a vehicle will finish the trip and start the next right after) you would have to wait until VehiclePosition -> VehicleDescriptor -> id is in the feed, yes.

BTW: With the Berlin/VBB & Hamburg/HVV GTFS-RT feeds we're talking about, the underlying HAFAS data source does not provide a true physical vehicle ID anyways. It provides a "virtual"/made-up vehicle ID though (where AFAIK a vehicle ID will not show up twice at the same time, but one a physical vehicle starts the next trip, it will get a new "virtual" vehicle ID), wich is used in TripUpdate -> VehicleDescriptor -> id; I should add this "virtual" ID to VehiclePosition -> VehicleDescriptor -> id as well. I have created https://github.com/derhuerst/hafas-gtfs-rt-feed/issues/5 for that.

Do you know what I mean?

I think I do, yes.

How do you tell vehicles apart when they do not have some sort of unique information?

For now, automatically, you can't. For humans, there's trip.tripId + vehicle.label.

That means, I did not find a way so far to present vehicles without them having unique identifies.

It definitely makes sense for your app to assume that the data sources provide unique IDs. But if you want to use vehicle IDs, that is a design question: Is your tool rather infrastructure-focused (as in "which public transport vehicle are out there right now?") or traveller-focused (as in "when will my bus arrive?")?

dancesWithCycles commented 3 years ago

What is "it" here? A vehicle?

Yes, I mean vehicles. In the first place, Dede shall show the vehicles out there in real-time. Adding helpful and convenient details like bus/route number, name of transit agency and direction can come later.

I should add this "virtual" ID to VehiclePosition -> VehicleDescriptor -> id as well.

That sounds like what I am looking for. As long as the id is unique my database does not get messed up and I do not care about the physical id right now. Please let me know, if I can be of any help. I could do some tests after Eastern.

In my eyes it make sense to link TripUpdate to VehiclePosition entities using the mentioned virtual id, as those two feed entities are supposed to match, right? Someone is always looking for a TripUpdate that matches a VehiclePosition or vice versa, right?

Cheers!

derhuerst commented 3 years ago

In my eyes it make sense to link TripUpdate to VehiclePosition entities using the mentioned virtual id, as those two feed entities are supposed to match, right? Someone is always looking for a TripUpdate that matches a VehiclePosition or vice versa, right?

Not always: Some feeds only have one of the two or they don't have the notion of trips.

But in our case, GTFS-RT built from HAFAS or HAFAS-like data sources, it totally makes sense. Currently, the architecture of hafas-gtfs-rt-feed makes this hard though, as it processes trips and movements (basically data about a vehicle and its upcoming arrival) independently.