Closed NA7KR closed 3 months ago
/opt/mesh-info/bin/meshinfo collector --run-once
2023-03-06 08:24:21.810 | ERROR | meshinfo.poller:_handle_connection_error:409 - Timeout on reading data from socket
Hangs does not return the prompt.
@NA7KR thanks for reaching out. It looks like it is failing to get the OLSR data from the local node. A few questions to help me understand what might be going on:
MESH_INFO_LOCAL_NODE
in the .env
file? If not, can you ping localhost.local.mesh
from the system running Mesh Info?MESH_INFO_LOCAL_NODE
, can you ping it from the system? I ran into an issue when I had two nodes connected via DTD that the DHCP server would switch between them, meaning I had a different IP address to connect to. If that's the issue then I probably need to hurry up and fix #90.Thank you!
1 3.22.12.0
2 MESH_INFO_LOCAL_NODE="10.234.160.225"
MESH_INFO_MAP_LATITUDE="44.886"
MESH_INFO_MAP_LONGITUDE="-123.015"
MESH_INFO_MAP_ZOOM="5"
MESH_INFO_LOG_LEVEL=ERROR
Like said it works for some time, then stops you can see my system
Kevin
I rum /opt/mesh-info/bin/meshinfo collector --run-once
Never exit as see had to call it…
This look be what is going on…
Kevin
@NA7KR are you still seeing issues? After the recent updates, I haven't had any issues with the collector service. I'm not sure exactly why, although I suspect a combination of the different logging library and re-architecting the poller might have addressed it.
Looks to working well again.
I have a request can you add another column to the table and show the range on nodes, IE /27 or /28 or 10.234.160.224/27 whatever it reports? ( would better 10.234.160.224-245 ) then flag any dups..
As we have some/or had some do not know if they fixed them yet.
Thanks for all your help.
Kevin.
From: Scott Searcy @.>
Sent: Tuesday, April 25, 2023 1:08 PM
To: smsearcy/mesh-info @.>
Cc: NA7KR @.>; Mention @.>
Subject: Re: [smsearcy/mesh-info] meshinfo-collector.service running but not (Issue #99)
@NA7KR https://github.com/NA7KR are you still seeing issues? After the recent updates, I haven't had any issues with the collector service. I'm not sure exactly why, although I suspect a combination of the different logging library and re-architecting the poller might have addressed it.
— Reply to this email directly, view it on GitHub https://github.com/smsearcy/mesh-info/issues/99#issuecomment-1522351685 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ABP7QGOILUYAW6Q2RU3G5TLXDAVJTANCNFSM6AAAAAAVQS3XSU . You are receiving this because you were mentioned. https://github.com/notifications/beacon/ABP7QGNUKHUEZLMNLQYVVJTXDAVJTA5CNFSM6AAAAAAVQS3XSWWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTS2XU7EK.gif Message ID: @. @.> >
Hmm, I might have spoken too soon. It looks like last night my collector stopped working, but this time it wasn't consuming all of the CPU. Restarted, we'll see if it happens again.
As mentioned on #122, based on some recent reading, I think this might be related to a edge case with how timeouts work. I'm going to try a different Python library uses a different approach for timeouts, I'm hoping that might fix it.
My first idea (trying HTTPX which advertises "Strict timeouts everywhere") didn't work, the collector still froze last night around 1am.
But I've got a few more ideas to try, and I'm trying to replicate the issue on another box to get more data.
I might have had some success fixing this. The collector has been running for almost 12 hours which is better than it was doing. If it stays up for the rest of the day then I'll get this code merged into the main branch and let you know so you can upgrade.
@NA7KR I think I've fixed the issues with the collector hanging. Please try updating to the latest version and let me know how it goes.
https://mesh-info-ki7onk.readthedocs.io/en/latest/installation.html#upgrading
Version 0.7.0 has been running stable for two weeks, so I'm going to close this issue, I think it is fixed.
meshinfo-collector.service show running but it has stopped collecting, only runs for 5 min then stops stop service and restart same.
This has been running OK. last like in syslog is meshinfo.poller:_handle_connection_error:409
Thanks NA7KR