Closed Foxi352 closed 1 year ago
Seems to be problem with actually router. its closing connection.
FYI, i don't know if that changes anything, but it's a 24 port POE switch, not a router. I don't know if the connection to API stays open the whole time or is established only every 30 seconds on poll, respectively on service call ?
Simply restarting the integration solves the problem every time. I never touch the switch is in a rack in my server room.
Even while the integration is in error state, i can launch a python script from a linux VM which uses the netmiko lib and can still succesfully do the task. I used that script with cronjob before i had the integration in place. It worked for years.
Don't hesitate if you want me to do further tests or need some more info.
model does not matter, since they are all running RouterOS. Its the same thing, they can act as routers, just not optimally because of internal connections. Baed on that, it seems like there is problem with reconnecting to device after connection crashes. I will have to look if that is a global issue. It may have gone unnoticed as mikrotik devices are usually rock solid.
I have tested it and cannot reproduce this issue:
2023-09-18 09:19:13.304 ERROR (SyncWorker_2) [custom_components.mikrotik_router.mikrotikapi] Mikrotik 10.0.1.127 error while building list for path /system/resource : Connection unexpectedly closed.
2023-09-18 09:19:13.306 ERROR (MainThread) [custom_components.mikrotik_router.coordinator] Error fetching mk6 data: Mikrotik Disconnected
2023-09-18 09:22:48.312 WARNING (SyncWorker_5) [custom_components.mikrotik_router.mikrotikapi] Mikrotik Reconnected to 10.0.1.127
2023-09-18 09:22:48.373 INFO (MainThread) [custom_components.mikrotik_router.coordinator] Fetching mk6 data recovered
Can you give me more information?
I have two Mikrotik switches. One is just integrated, but i don't do anything (yet) with the device / entities for now. The second switch is the one this ticket refers to.
On that second switch, i have two automations running:
Sometimes this does run for days without problems, and sometimes the problems occurs once or twice a day. The said problems are the ones described in this ticket:
When i reload the integration just for the switch in error, everything starts working again. Until the next time it errors out.
I upgraded yesterday to HA 2023.9.2 and Mikrotik integration v2.1.4. Since then it did not error out until now. I propose to wait for some days and i will keep you informed if maybe it was resolved as sideeffect of another fix ...
The problem appeared again this morning. The core-01 switch is still working, never had that problem, but i don't do anything with it in HA. The second one, distri-01, has all entities unavailable:
Just reload / reinitialise distri-01 and it's good again for some hours / days.
What can i provide you to help with this one ?
Check that you are not actually touching API port or connection itself. Could be also something like custom DDoS protection. Also check for rules with tarpit
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
This issue was closed because it has been stalled for 5 days with no activity.
Describe the issue
I use Mikrotik custom integration to manage 2 different switches. On one of the switches i have automations enabling or disabling network ports.
Every one to two days the integration stops working and it seems to be disconnected from the switch API and automations do no longer work. The integration then also shows up proposing an update from the current version to the
unknown
version.A simple restart of the integration fixes it for the next 1 to 2 days. This happens randomly and is not predictable as like "every x hours after integration restart".
How to reproduce the issue
Simply let it run for some days performing a scheduled automation from time to time.
Expected behavior
If connection drops for whatever reason, it should be handled gracefully and integration should reconnect.
Screenshots
Software versions
Traceback/Error logs
Here is a log from one of the scheduled automations that disabled a switch port.
Additional context