vingerha / gtfs2

Support GTFS in Home Assistant GUI-only
https://github.com/vingerha/gtfs2
MIT License
65 stars 4 forks source link

[BUG] Sensors for Stops do not reliable become unavailable if no Departures, others in return not becoming available #57

Closed dafunkydan closed 2 months ago

dafunkydan commented 2 months ago

Describe the Bug Normal Behaviour: I have set up static Zones, representing a Stop. If a particular stop_id doesn't have Departures any longer in scope, it normally goes unavailable. Other way around, if there are Departures for a stop_id and there is no Sensor for it, the Sensor gets created by the Integration.

Bug Behaviour: However, creation and Deletion of Sensors for that Zones doesnt happen always reliable. For empty Stop-Sensors not going "unavailable", i observed that they contain just {} for the next departures. That seems to coincidentially prevent the (re)creation of Sensors, when there are again Departures for it again.

Impact:

Steps/data to reproduce the behavior, e.g.: I couldnt identify the circumstances, when this happens. I observed it maybe 2-4 Times within the last 48h. Most of the Times this doesn't happen, though. General Setup: Static GTFS: https://rnv-dds-prod-gtfs.azurewebsites.net/latest/gtfs.zip With RT-Data: https://github.com/vingerha/gtfs2/issues/51#issuecomment-2057470068 Of course there needs to be stop_id previosly having departures, and but then without any (maybe because of day/night changes), to observe such a behavior

Temporary Workaround: After Updating the local stops (either manually reloading, or by a Service Call) everything gets aligned again. The Zombie-Sensors become unavailable, other Sensors get (re)created.

Expected Behaviour:

Release used gtfs2: 0.4.4.7 HA: HAOS Core: 2024.4.3

Additional Couldnt find anything useful in the Logs. See Image for a comparison when it occured before, and after reloading: comparison before-after reloading with faulty sensors

vingerha commented 2 months ago

As yo already write above...it its not stable behavior and I need a way to reproduce this. At present I am not recreating sensors and they run dead by themselves.... but maybe I should introduce active deleting/recreating.

vingerha commented 2 months ago

Another observation, when I deleted the sensor, the report remained showing data too..until I refreshed the screen.

vingerha commented 2 months ago

Expected Behaviour: If a Sensor doesnt have any next departure Lines (or contains {}), it goes unavailable

unavailable is a system thing which I donot want to enforce as this also is set during e.g. a restart before gtfs2 has collected any data. What I can do is to put a count of departures instead of the stop name. I just played around by changing the static data in the database and I cannot reproduce this easily. When I change the times, the departures become {}, the sensor remains showing the stop-name as state value and when I change the time back it refreshes (after the refresh-rate as defined in the config) with the correct data. Even if I change it to show count-of-departures, this will likely not tackle the 'unavailable' but at least your card will not sow it if you filter on > 0

dafunkydan commented 2 months ago

What I can do is to put a count of departures instead of the stop name.

Well, if everything works, i think as it is now, is pretty good behaviour to have the Sensors in the Format [stop_id]_local_stop_zone.[entry_name]. Makes filtering easy, and by knowing which stop ID is (usually) for which Train, i can further tweak the filtering. I have the feeling changing this that would make other Things harder - especially as this would mean a lot of sensors would get created/abadonned, if it would have a dynamic count for departures. Mhmm.....

At present I am not recreating sensors and they run dead by themselves

Again i see, i dont know enough about the mechanisms :-( How do they run dead by themself? The State for each Sensor is static (The Name of the Stop), and in Attributes there is more stored...

unavailable is a system thing which I donot want to enforce as this also is set during e.g. a restart before gtfs2 has collected any data

Just an idea: Would an additional Sanitize Routine solve that? It could just check if Next departure Lines: returns {}. If so, it could e.g.

vingerha commented 2 months ago

Still on it... sensor becomes unavailable after some time and does not recove so it seems, not clue why it does not do that as the config_entry is still there

vingerha commented 2 months ago

Tested it again... sensor did not become unavailable for 40+ mins and regained data when it found new departures. Am confused why / when it becomes unavailable.

vingerha commented 2 months ago

I have again tested this for a longer period where I spent time myself and frequently looking at the sensors...sadly (or?) all seems fine. I am not at all stating that this issue is non-existing but .... no replication means I cannot dig deeper :(

vingerha commented 2 months ago

I added a HA native check as suggeste dby one of the devs ...will propagate this to the next release which may (??) help. Sucks if cause can not be found

dafunkydan commented 2 months ago

Meanwhile it occured again at my Side. But haven't set up logging in configuration.yaml yet (have to figure out what's needed) and jobs to do. But now i know better which sensors to monitor at least, and roughly when. Hope I can go hunting soon! Oh and yes, maybe those native checks will solve that magically anyway ☺️ Will update as soon as I got something. Thankin you so much for everything!

vingerha commented 2 months ago

Still have no reproducable thing but try 0.4.4.9

dafunkydan commented 2 months ago

In the Meantime it didnt happen again on my side as well. Got 800mb Log in Return πŸ˜† Other Flaws on the other Hand, caused by my Data Provider...

Well - im gonna give 0.4.4.9 a Shot! Great, as always, Thanks! Gonna keep the Logs running, and if it doesnt occur again, well,... either magically solved by itself, or the update! :-)

vingerha commented 2 months ago

I am closing this for now....get it out of plain sight :)

dafunkydan commented 2 months ago

Did a big Step forward! I was able to observe this shy 'Bug' in the free Wild and have a 1,6GB Log as well πŸ˜„ This might not be necessary anyway, because i found a Solution within custom:flex-table-card πŸŽ‰

The Card is configured by using auto-entities, with a Wildcard to catch all entities belonging to a Station. One Entity only containing '{}' as next departure leads to the whole card rendering empty.

The Solution seems to be the Option strict: true If set so, this nasty entity just gets ignored, and the other get displayed correctly.

I did a short Video to demonstrate the Issue and comparing strict: set to true and false - then reloading the Local Stop (by the Service Call), and then to see that after the Reload the previosly Zombie-Entity went unavailable, and now it doesnt matter if the Option strict: is forced, where in the beginning, it did.

https://github.com/vingerha/gtfs2/assets/103875104/00e941d0-f31b-4fd4-8a84-c6f28d8f294f

However: I think while it still would be smoother, if the Zombie-Entity would render unavailable automatically. I havent checked / found yet, if it prevents other entities from becoming available. I have the feeling i had that before, but in that that example at least that didn't happen. Have to keep an Eye on that.

@vingerha Its a Pain in the a** to search through 1,6GB Text πŸ€ͺ If you think the logs might still be helpful, can you let me know what i specifically look/filter for?

vingerha commented 2 months ago

I am using flex-table-card as well and notice that it stick in memory.... i.e. I opened Edge, went to HA, to my gtfs tab and noticed entries from yst...the I refreshed the screen...and gone they were. I also noticed that flex card is not working great with auto-entities. Anyhow.... I propose you test this with 'entities' card or something. For me, alll is still fine and no....no clue how to reduce logs, I agree to have added a lot but this is the only way I can communicate on issues without having to setup everying myself. Lastly, I am still understanding that this only happens when Realtime is setup, not?

dafunkydan commented 2 months ago

Personally, i haven't noticed the sticking-in-memory with flex-card, or that with a Refresh anything old went away. A slightly Delay between Attributes-Updates and the rendering, yes, but in the Range of Seconds. Using mainly Firefox btw, maybe this Time i'm lucky.

I just started with the Examples how flex-table-card could be set up, this is why i used auto-entities. However, with that strict-Option, i think auto-entities can be omitted in this case completly:

type: custom:flex-table-card
clickable: true
entities:
  include: '*local_stop_zone.haltestelle_xyz'
strict: true

Maybe you wanna check this out, there might be a 3% chance that that fixes your Refresh-Problem by simplifying?

only happens when Realtime is setup

This time the check was with RT. Have to validate it against without, yes. Need to set up this too, for another station where i know that happens.

no clue how to reduce logs, I agree to have added a lot

I would be fine with uploading the whole stuff somewhere! I just wanted to make it easier, and didnt want you to choke πŸ˜„ So, maybe i continue with checking it against Realtime?

Just thought, as havent seen the strict: true option in the Examples, and it has a great impact on my side, i thought this is worth sharing πŸ™ƒ