vingerha / gtfs2

Support GTFS in Home Assistant GUI-only
https://github.com/vingerha/gtfs2
MIT License
65 stars 4 forks source link

[FEATURE]: prevent flooding HA with all gtfs stops ever seen, and re-use entities. #75

Closed FabienD74 closed 1 week ago

FabienD74 commented 1 month ago

Describe the solution you'd like gtfs create entities with name of "stops", which lead to create all stops you have ever crossed ;-)))))) Which can be ... A LOT in a "small cities" like Paris, New-york, Tokyo...in just a couple of hours/days ....

if we replace the "stop name" with a counter , we could re-use / refresh existing ones...

Describe alternatives you've considered

Additional context


class GTFSLocalStopSensor(CoordinatorEntity, SensorEntity):
    """Implementation of a GTFS local stops departures sensor."""

    def __init__( self,counter, stop, coordinator, name) -> None:
        """Initialize the GTFSsensor."""
        super().__init__(coordinator)
        self._stop = stop
        self._name = name + "_local_stop_" + str(counter).zfill(3)  

and


async def async_setup_entry(
    hass: HomeAssistant,
    config_entry: ConfigEntry,
    async_add_entities: AddEntitiesCallback,
    ) -> None:
    """Initialize the setup."""   
    if config_entry.data.get('device_tracker_id',None):
        sensors = []
        coordinator: GTFSLocalStopUpdateCoordinator = hass.data[DOMAIN][config_entry.entry_id][
           "coordinator"
        ]
        await coordinator.async_config_entry_first_refresh()
        stop_index=1
        for stop in coordinator.data["local_stops_next_departures"]:
            sensors.append(
                    GTFSLocalStopSensor(stop_index, stop, coordinator, coordinator.data.get("name", "No Name"))
                )
            stop_index = stop_index + 1
FabienD74 commented 1 month ago

Nice and clean ;-)

image

FabienD74 commented 1 month ago

we also have to change code here:

       # Add next departures with their lines
        self._attributes["next_departures_lines"] = {}
        if self._departure:
            for stop in self._departure:
#                if self._name.startswith(stop["stop_id"]):
                 if self._stop["stop_id"] == stop["stop_id"]:

                    self._attributes["next_departures_lines"] = stop["departure"]
                    self._attributes["latitude"] = stop["latitude"]  
                    self._attributes["longitude"] = stop["longitude"]  
FabienD74 commented 1 month ago

I Think i will have to chase another issue: Startup is taking more than 60 seconds .... And the question is: how to trace ? where do we spend most of the time... ??? Loop? Tables? SQL stateements?....hummm

image

FabienD74 commented 1 month ago

I did a "soft restart" of home assistant and now it's fine. No more warnings... Strange. So from time to time, it starts in les than 10 seconds, and ..sometime more than 60 ... that's huge

vingerha commented 1 month ago

I forgot a bit abuot that one...am running a daily (spook) service to remove orphaned entities so these disappear with me. Can you provide a PR then I can apply it and verify myself for a few weeks. I am also not sure how these would be re-used, the code (that I did not analyse in detail) seems to go to 999 .... so still a lot of sensors are feasible, or ?

On the start-up, same here.... I am not sure which one keeps it longer

FabienD74 commented 1 month ago

It's also my first time using Github..... Right now i'm directly changing the running code ... ;-))))) I hardcoded the "Fill with zeroes" => str(counter).zfill(3) Remeber it is per "device/person" and per "datasource" ... well it is per "Vicinity" when doing the setup of gtfs I 'm not expecting more than 999 stops of the same datasource around the current GPS location .... ;-) ;-) ;-) ... should we ?? ;-)

vingerha commented 1 month ago

Well, I move between 3 providers here and e.g. Basel, in Basel there are dozens of stops. I easily end up with 30+ after a 2d visit...so again...how would these be re-used?

FabienD74 commented 1 month ago

Indeed .... now the name is generic .... but not yet re-used i think... arghhh.... ( i will be back ;-) )

vingerha commented 1 month ago

And this is just one element, can you imagine the time I spent to find all the data based on source-differences, calendar, calendar_dates, api-keys', pygtfs with errors and incompleteness, adding real-time, protobuf en json sources, etc.etc. Although I copied a bit from gtfs (core), that one proved faulty in a lot of places

vingerha commented 1 month ago

btw, the way I work with github

  1. create a fork
  2. install github desktop on your laptop or desktop and add a repo locally by using 1, e.g. in c:\dev
  3. use the fork instead of mine to install the app locally in custom_components (remove mine first)
  4. update the custom_components/gtfs2 as you like, reboot etc... until you are done/happy
  5. pu the code from cust_comp to the local git in c\dev\gtfs2
  6. use github desktop to push it to your forked repo ...then open a PR
FabienD74 commented 1 month ago

Well i restored HA. your latest version search for zip file at boot, then rebuild DB. it rebuild the whole DB also if i re-create vicinity settings.... I was not able to find out how to have entities updated. Looks like my DB is deleted and re-created all the time.

vingerha commented 1 month ago

your latest version

which one?

search for zip file at boot / re-created all the time

? no such problems with me or others, not sure what you are doing, do you have a service call that triggers?

FabienD74 commented 1 month ago

The version retreived testerday 6pm via github. i guess the latest published.... with deletion of sqllite and also managing dates in zip file and "crying" for a zip file at startup ;-) ;-)...

I'll have to try again. I think it was related with my gtfs settings....no more compatibles.. distance over 1000m and/or more than 15 stops around location...

PS: radius = ( 360 self._data.get("radius", DEFAULT_LOCAL_STOP_RADIUS) ) / ( 40000 1000 ) => 1 meter = 360/40000000 = 9e-6 ( a bit higher than 1/130000 = 7.69e-06 )

:-)

vingerha commented 1 month ago

I did not make a release yet...the latest version is release 0453 and this is used by a few people (myself of course) the one with the sqlite - check is 'main'

vingerha commented 1 month ago

for the radius... where did you find that 'formula' ? EDIT: can add easily but need to know if this is the 'best' :)

FabienD74 commented 1 month ago

Pure logic. Circonference of earth is 40.000 km (hopefully the earth is nearly a perfect sphere, and if we would like to be more precise, we should consider elevation... To retreive some few stops, it's a bit overkill....) => 360 degrees to cover 40.000 km => 360/40.000.000 = 1m

The real formula to compute distance between 2 corrdinates is way more complex ( sin(), cos(), root square....) We cannot ask the DB to compute that on the fly within the SQL statement ;-))))) Currently we use a square area but we could (should!! should!) "post filter" in python... :-)

About ths main topic ( remember the title "prevent flooding HA..."), i created my own class to store "stoplist" in dedicated entity.... and I added "gtfs2_" on all entities created... So we can filter them in the HA recorder. Still in progress

vingerha commented 1 month ago

ok...changed to /111111

FabienD74 commented 1 month ago

Same issue again.... deleted all setiings in gtfs. moved your source code into gtfs2 rebooted (to make sure) -> add new integration gtfs2 -> add new source -> 20 minutes to create sqlite then nothing ... no integration available in devices... ok .. may be that's fine. -> add integration-> gtfs2 -> create setup for vicinity-> chose my datasource (just freshly uploaded)-> BOUM

=> DB destroyed => upload in progress...

vingerha commented 1 month ago

which version? Which source? Logs? HAOS or HA docker?

EDIT: for logs, if you use portainer you can also examine the 'print' statements from pygtfs doing the unpacking zip>sqlite (they did not implement logging)

FabienD74 commented 1 month ago

i'm running HAOS Source: github, i did a copy paste from my PC directly into HA custom-component/gtfs2.

in the log:
2024-05-19 16:44:50.882 DEBUG (SyncWorker_14) [custom_components.gtfs2.gtfs_helper] Getting gtfs with data: {'extract_from': 'zip', 'file': 'TEC-GTFS', 'url': 'na'}                           
2024-05-19 16:44:50.883 DEBUG (SyncWorker_14) [custom_components.gtfs2.gtfs_helper] Checking if extracting: TEC-GTFS                                                                           2024-05-19 16:44:50.884 DEBUG (SyncWorker_14) [custom_components.gtfs2.gtfs_helper] Checking if file contains only future data: TEC-GTFS.zip                                                   
2024-05-19 16:44:50.894 DEBUG (SyncWorker_14) [custom_components.gtfs2.gtfs_helper] Youngest calender date from new files: ['20240412', '20240429'], is: 2024-04-12 00:00:00                   
2024-05-19 16:44:50.895 DEBUG (SyncWorker_14) [custom_components.gtfs2.gtfs_helper] New file is not containing only newer dates, removing current/copied sqlite                                
2024-05-19 16:45:05.044 INFO (SyncWorker_14) [custom_components.gtfs2.gtfs_helper] Exiting main after start subprocess for unpacking: TEC-GTFS.zip                                             
2024-05-19 16:45:05.045 DEBUG (MainThread) [custom_components.gtfs2.config_flow] Checkdata pygtfs: extracting with data: {'extract_from': 'zip', 'file': 'TEC-GTFS', 'url': 'na'}              
2024-05-19 17:02:09.104 DEBUG (MainThread) [custom_components.gtfs2.gtfs_helper] Getting datasources for path: gtfs2                                                                           2024-05-19 17:02:09.105 DEBUG (MainThread) [custom_components.gtfs2.gtfs_helper] Datasources in folder: ['TEC-GTFS']                                                                           2024-05-19 17:02:26.274 DEBUG (MainThread) [custom_components.gtfs2.config_flow] UserInputs Local Stops: {'file': 'TEC-GTFS', 'device_tracker_id': 'zone.gtfs_test_location', 'name': 'GTFS_TES2024-05-19 17:02:26.275 DEBUG (SyncWorker_30) [custom_components.gtfs2.gtfs_helper] Getting gtfs with data: {'file': 'TEC-GTFS', 'device_tracker_id': 'zone.gtfs_test_location', 'name': 'GTFS_2024-05-19 17:02:26.275 DEBUG (SyncWorker_30) [custom_components.gtfs2.gtfs_helper] Checking if extracting: TEC-GTFS                                                                           
2024-05-19 17:02:26.275 DEBUG (SyncWorker_30) [custom_components.gtfs2.gtfs_helper] Checking if file contains only future data: TEC-GTFS.zip                                                   
2024-05-19 17:02:26.282 DEBUG (SyncWorker_30) [custom_components.gtfs2.gtfs_helper] Youngest calender date from new files: ['20240412', '20240429'], is: 2024-04-12 00:00:00                   
2024-05-19 17:02:26.282 DEBUG (SyncWorker_30) [custom_components.gtfs2.gtfs_helper] New file is not containing only newer dates, removing current/copied sqlite                                
2024-05-19 17:02:40.271 INFO (SyncWorker_30) [custom_components.gtfs2.gtfs_helper] Exiting main after start subprocess for unpacking: TEC-GTFS.zip                                             
2024-05-19 17:02:40.271 DEBUG (MainThread) [custom_components.gtfs2.config_flow] Checkdata pygtfs: extracting with data: {'file': 'TEC-GTFS', 'device_tracker_id': 'zone.gtfs_test_location', '202
vingerha commented 1 month ago

Yep...see the issue... I only checked extracting files not setting up new routes...maybe tomorrow (apéro now)

vingerha commented 1 month ago

If ever I would like someone to do is to write a few 'automated' tests, there is soo much to check but I quite often forget a end-2-end check

FabienD74 commented 1 month ago

I think there are 2 big parts in this development. 1) Upload / maintenance of DB 2) Usage of DB.

I'm afraid about anything automatic regarding step 1..... a flag "Automatic Upload" in the settings defauted to false ?

vingerha commented 1 month ago

small thing to change...sometimes small things have big consequences

gtfs_helper - Copy.py.txt

FabienD74 commented 1 month ago

Github Desktop don't find any change in the code.... what have you changed ?

vingerha commented 1 month ago

So your PR only change this? ... it btw should also update the unique id

image

vingerha commented 1 month ago

Github Desktop don't find any change in the code.... what have you changed ?

I already had pushed it to main, maybe that is why you cannot see it? image

FabienD74 commented 1 month ago

Entity_id ., i have been figting many hours to have working .... ... or .... when HA suddently decide to not use it, and use the name..... when sometimes HA says "not valid"... or something else....try-again, restart, chacks logs, change again, waiting refresh of sensort, try again, fix, reboot, ... the hard way! of-course, like always, not many recents examples to follow .... solution : "try-and-error"

May be now it can be simplified... ;-)... i don't understand the usage of attributes starting with '_' .... are those the real-one used by HA ?? There is a mapping/copy somewhere?

Not sure about the unique_id. may be. (some ppl commented it was for "internal purpose") Some tests also went wrong with duplication of sensors ending with "_2" "_3" "_4". ;-) ( The nightmare of HA ;-) )

vingerha commented 1 month ago

i don't understand the usage of attributes starting with '_'

where?

FabienD74 commented 1 month ago

example here: self._stop = stop self._name = self._stop["stop_id"] + "_localstop" + self.coordinator.data['device_tracker_id'] self._attributes: dict[str, Any] = {}

self._attr_uniqueid = "sensor.gtfs2" + self._name self._attr_unique_id = self._attr_unique_id.lower() self._attr_unique_id = self._attr_uniqueid.replace(" ", "") self.entity_id = self._attr_unique_id

how to know the Existing ones "inherited" from HA i guess, VS those created in our code ..... ?

BTW: I'm using "Studio code server" installed via HACS, .. may be there is a setting somewhere to add "real" autocompletion and help ? right now it's just a text editor with multiple tabs... ;-) ;-)

Fabien

FabienD74 commented 1 month ago

image

I cannot easily see your changes... If i open it it see your changes.... But in my ming it should be compared to my github repository.

vingerha commented 1 month ago

self._stop = stop

development choice ..... If used consistently then this helps differentiating _stop from stop. It is not a 'must' but it helps when analysing code.

e.g. if you use self.stop = stop and way down in the code after pasisng things on (part of self) you see

if stop == "12344" : ... then you may have made a typo where it should be self.stop and not stop

IF used consistently that is... I catch myself for not always doing this

vingerha commented 1 month ago

On your screenshot, no clue...

FabienD74 commented 1 month ago

So far so good. My sqlite file it still there ... ;-)))) How to test ? Should i remove the "remaining" ZIP file ? Replace with original one with "shapes" inside?
create new sensors ?

vingerha commented 1 month ago

Test...well... you need a file that adheres to exception (future date) and then load that. What it should do is to keep current sqlite and keep the downloaded zip as well (the pair is a 'must' for pygtfs in certain cases). You can use the service call to update it from a config/www/xyz.zip file

FabienD74 commented 1 month ago

I moved the class "stoplist" in a dedicated file: Here is the latest version with some new features:

I know, too much hardcoded values...

sensor_stoplist.py.txt

FabienD74 commented 1 month ago

I forgot. we need ;


import geopy.distance
from homeassistant.helpers.event import (
    async_track_state_change_event,
    async_track_time_interval,
)
from homeassistant.const import (
    STATE_UNAVAILABLE,
    STATE_UNKNOWN,
    EVENT_HOMEASSISTANT_STARTED,    
)
vingerha commented 1 month ago

sensor created with basic info at startup ( no DB select at HA startup ) added listner on event EVENT_HOMEASSISTANT_STARTED, to finish the init, and perform the first full refresh/update.

=> I donot immediately understand the diff with before. When I restart HA I want (!) the sensor to be loaded 'immediately' with new/current data, isn't this more code (and maintenance and test) for the same target?

added a listner on device tracker to receive new coordinates in real time. added warning( and error ) if refreshed/updated too frequently (warning below 30 seconds, ERROR AND SKIP if <5 seconds) skip update if new GPS coordinates < 5 meters => This is nice for the local-stops but must be configurable then, I have seen rt-provider that do not even push or allow updates below 1 minute...hence my fixed-route sensor refreshes per-definition once per minute and only updates static when that configured refresh freq is hit.

consider data obsolete above 300 seconds, and update/refresh. otherwise silently skip update ;-) ;-) ;-) => should also be configurable, there is no need to refresh e.g. during the night or when not using the datasource. e.g. I have Zou + Basel + Netherlands and only need 1 to be active

Overal I like gimmicks but keep in mind that the complexity quickly increases and one should use HA functionality to refresh (service calls / automation) so you give more control to the end-user

vingerha commented 1 month ago

And...finally ... please provide PR with not too many changes, it is quite a challenge to review the impact plus I cannot test it all and would like to avoid releases that 'crash' with various users needing reset or quick follow-up.

vingerha commented 1 month ago

Looked a bit at the code, would prefer

vingerha commented 1 week ago

Too much topics in this thread, lost its context and original solution proposal rejected (by me)