vingerha / gtfs2

Support GTFS in Home Assistant GUI-only
https://github.com/vingerha/gtfs2
MIT License
65 stars 4 forks source link

Update gtfs_rt_helper.py #86

Closed Lunatixz closed 1 week ago

Lunatixz commented 4 weeks ago

HASSIO Geojson integration requires a unique "Title" parameter in-order to load as individual entries.

Refactored string to include vehicle id as part of the "Title".

vingerha commented 4 weeks ago

Not sure about the uniqueness that you mention, never ran into issues with the existing setup which bases itself on trip With met the trip-id are always unique before an old entity gets orphaned and removed form HA .... and only because they are 'long' at times, I take the last 3 characters. With that, the uniqueness of route+direction+trip is covered till next week (for my provider). For me, changing it to a vehicle_id will make it less unique as the bus drives over route A in direction 0 with id 123, next it will return so A-1-123 and next it will again take to same route A-0-123 meaning that within a period of 1 hour I get twice the same data or a different trip. I am not sure what difference this makes for the map-usage ( I think not much?) but it is already a lot less unique for me.

Lunatixz commented 3 weeks ago

I'm having this issue currently with NYC MTA GTFS Realtime, in the original code multiple bus lines are under one GeoJSON entry... the last parsed replacing the previous entries. https://www.home-assistant.io/integrations/geo_json_events/ States each "title" in the json needs to be unique to create induvial sensors. The org. code was taking the last 3 digits which in this case is least likely to be unique, if you don't like the schema maybe try taking to first [:5]? I would not rely on the last three. Either way each API will pose its own challenges...

Perhaps it would be simpler to add a index reference to the "Title"? This will gurantee a unique entry based on the meta parsed.

 for idx, entity in enumrate(feed_entities):
    geojson_element["properties"]["id"] = str(idx) + ":" + str(self._route_id) + "_" + str(vehicle["vehicle"]["id"]) + "_" + str(vehicle["trip"]["direction_id"])

With the PR changes all Bus lines are added as sensor by GeoJson and now work well, I setup a auto entries card to load them and track them.

vingerha commented 3 weeks ago

I am ok with 5 and you could easily adapt that locally if that works for you. The whole trip-numbering thing is amazingly different between providers, where 3 would mean 999 on a day ...why need more?

I just tried to keep it small with randomly chosing 3... having no other reference than my own provider

EDIT on the index, probably fine too but will need to see how this works, it should re-start after a specific #, i.e. not grow forever

EDIT2: the doc does not say it needs to be unique, the question is more : do you expect two vehicles to be active at the same time (or during the same time the entity is not orhpaned) and for the same route/direction and trip? So the trip, being part of this key must be 'unique' within a certain timespan, for my provider this is a week before the same ttip pop's up and this is sufficient make them go stale and disappear.

Lunatixz commented 3 weeks ago

Whatever works, I can keep my changes local... I only wanted to contribute where it may help others.

All entries in the GeoJSON feed must define a geometry which typically is a point or polygon with geo coordinates. In addition, this platform will look for a title key in the entry’s properties and use that as the entity’s name.

In my use case if I used the org. code prior to my PR; Despite GTFS having more than one bus it only created one GeoJson location. When I evaluated why, it was because the "Title" for all entries were the same ignoring the bus numbers which were different. Looking further it was discovered that all the vehicle-ids share the same last three digits. So despite the json having multiple entries they were all "Title" the same. Leading to a single GeoJson entry, and ignoring all the others (LILO).

vingerha commented 3 weeks ago

Donot get me wrong, I do want to improve to0 and this topic hardly got any attention in the past as no one reported anything. Still..what you copied above does not say it needs to be unique, it only needs to be unique if you think/assume that the entities will overlap, i.e. that you have more vehicles on the same route/direction/trip(3digit) at the same time ... which I thought to be a very low risk. At the very base... I do not care about that text, I care about reality. So if you have (!) an issue with multiple vehicles on the same entity....then indeed try to find something that makes them more unique but do understand that this needs to apply to a whole lot of other users too and trip-id are 'invented' by providers as they come/go so last 3 digits may work for me, apparently not for you

Lunatixz commented 3 weeks ago

IMO using the last 3 digits has to go, it's drawing conclusions on the naming schema for all GTFS Real Time API's. In my case; all the vehicle id's on my route all end with the same last three characters. While the GeoJson doc doesn't explicitly say "unique" you can conclude multiple entries all using the same "Title" it will only create a single entry overwriting through iteration. ie. In my case it will lump multiple buses on the same route as one entry.

I believe using the array index might be the safest option?

vingerha commented 3 weeks ago

can you send me the links to both your static and rt data gtfs please? Just need to look at it. I am not at all surprised that this might happen but I aim to keep the title as clean as can-be :)

Lunatixz commented 3 weeks ago

https://bustime.mta.info/wiki/Developers/GTFSRt

vingerha commented 3 weeks ago

LOL...key...... I did ask for this, might you have a file instead (json if you use the service call)

vingerha commented 3 weeks ago

Forget that... I have the data.. and it is very clear that this is not working as the trip id ending with "D..N" are numoerous...

vingerha commented 3 weeks ago

Maybe I can convert the trip id to something 'small' and unique, i.e. I cannot use the tripid as a whole, just one provider example with me "chouette:TimeTable:9b30aa3d-2967-4b31-b12d-122ff7da6906:LOC"

EDIT: this provider does not (!) have vehicle locations, hence I would have found the same problem as well with the last 3 chars

EDIT2: but I would prefer to have the name non-cryptic...e.g. adding a converted-non-readable hex(?) string will not help at all

vingerha commented 3 weeks ago

Tomorrow mre time, closing off as need to spend time on my diner :) ...btw, for chatty stuff, same name on Discord

vingerha commented 3 weeks ago

Gave it a bit more thought and what could be done is 'hash' the trip_id e.g. via CRC32 .. and possibly take the last 3digits still :) As the current last 3 digits of a trip id make no sense at all, then a 3 digit part of the hash is of the same level. ... what do you think?

Lunatixz commented 3 weeks ago

IMO it's adding more complexity then required; and obfuscating labels. If each element of the feed_entities array is meant to be a induvial entity index labeling is probably your best bet.

vingerha commented 3 weeks ago

I donot understand what your concern is... a last-3-digit of a trip_id is already non-informative so changing that to other non-informative but more unique characters is similar. Then, it is actually rather simple whereas adding an index...not sure how to control this, when do you want to reset it? On another note, the current setup actually allows re-use of the entity if the same trip happens again, i.e. in cases where the trip repeats daily, no new entity needs to be created.

Either way, if you have a good proposal with proper case then fine, for now I see this as just forming ideas :)

EDIT... note that the title needs to remain the same with new updates for the current trip, i.e. when the vehicle moves and a new set of lat/lon flows in , there should be no new index, i.e. the same entity needs to be reused

Lunatixz commented 3 weeks ago

Looking at it from my limited perspective; Trying to keep a static persistent label across GeoJson updates is pointless; Almost all the information is dynamic; users can use a auto entities card for picking up dynamic entities. IMO the only important information is the lat/lon and bus# as long as the title is unique to each entity it doesn't matter.

I didn't see any logic in the code to evaluate each item in feed_entities, so the need to keep track of each entity's label using a crc32 is not needed since it won't ever come into play.

A index would have the same effect without the need to create gibberish crc32 hash.

If however you created a more robust parser that evaluated each item in feed_entities for changes then sure crc32 would be a good way to label things.

vingerha commented 3 weeks ago

I wholehartidly disagree, you come with a problem where the title is not unique and this is because it uses the last 3 digits. By taking the whole trip_id and making a hash out of it ...it will become unique and a lot smaller in #characters. An index may (!) have the same effect as long as it does not change the title for the vehicle in progress, i.e. every minute when it asks for an update, it should verify if if it has already been used and is active and imo this is quite a challenge .. or: code that is not needed because: using a stable basis as trip_id does not need to be checked, neither would its crc-representation.

What I created works but I agree it only works if the title is unique-ish for the trip (!) ... and if the last 3 digits are not making it unique then something else needs to be done, again: for the trip, not the vehicle, the route etc.

Lunatixz commented 3 weeks ago

It's your project, you would know best... I have a limited understanding of GTFS schema and have never dived deep into the code of this project.

I guess its a matter of how large the feed_entities array can be, if there is a chance it can return a large number of results; then maybe a hash would be more efficient.

vingerha commented 3 weeks ago

If you have a good/working proof of concept which is not based solely on your own requirements than I am fine. For now, I believe that with the crc it should fix it.... will make a dev version in the next week(s) so you can test it...which will also help me to possibly understand it better

vingerha commented 3 weeks ago

It was indeed not that difficult, can you please test release 0.4.6.3d6 ... it states donot use but that is beacuse I donot want to have others use it :)

vingerha commented 3 weeks ago

Finally had some time to set it up as map, at this very moment, 16:16, 4 buses are on the same route going north, all of them have the new last-3-digit-of-crc32-tripid and are thus unique as in this (my situation) the trip_id is unique for each of these buses (i.e. the crc did not change thing for this sit.) but for you this should be an improvement .... I hope

image

Lunatixz commented 3 weeks ago

Thanks I'll give it a try... and let you know.

vingerha commented 2 weeks ago

Any feedback? If OK now then please close this PR (or I will do it?)

vingerha commented 1 week ago

No more feedback so assume resolved