smsearcy / mesh-info

Collect and view information about an AREDN mesh network.
GNU General Public License v3.0
5 stars 2 forks source link

meshinfo-collector.service running but not #99

Closed NA7KR closed 3 months ago

NA7KR commented 1 year ago

meshinfo-collector.service show running but it has stopped collecting, only runs for 5 min then stops stop service and restart same.

This has been running OK. last like in syslog is meshinfo.poller:_handle_connection_error:409

Thanks NA7KR

NA7KR commented 1 year ago

/opt/mesh-info/bin/meshinfo collector --run-once

2023-03-06 08:24:21.810 | ERROR | meshinfo.poller:_handle_connection_error:409 - Timeout on reading data from socket

Hangs does not return the prompt.

smsearcy commented 1 year ago

@NA7KR thanks for reaching out. It looks like it is failing to get the OLSR data from the local node. A few questions to help me understand what might be going on:

  1. What version is the firmware on your node?
  2. Have you overridden MESH_INFO_LOCAL_NODE in the .env file? If not, can you ping localhost.local.mesh from the system running Mesh Info?
  3. If you have a different MESH_INFO_LOCAL_NODE, can you ping it from the system? I ran into an issue when I had two nodes connected via DTD that the DHCP server would switch between them, meaning I had a different IP address to connect to. If that's the issue then I probably need to hurry up and fix #90.

Thank you!

NA7KR commented 1 year ago
  1. 1 3.22.12.0

  2. 2 MESH_INFO_LOCAL_NODE="10.234.160.225"

MESH_INFO_MAP_LATITUDE="44.886"

MESH_INFO_MAP_LONGITUDE="-123.015"

MESH_INFO_MAP_ZOOM="5"

MESH_INFO_LOG_LEVEL=ERROR

  1. Yes

Like said it works for some time, then stops you can see my system

Kevin

NA7KR commented 1 year ago

I rum /opt/mesh-info/bin/meshinfo collector --run-once

Never exit as see had to call it…

This look be what is going on…

Kevin

smsearcy commented 1 year ago

@NA7KR are you still seeing issues? After the recent updates, I haven't had any issues with the collector service. I'm not sure exactly why, although I suspect a combination of the different logging library and re-architecting the poller might have addressed it.

NA7KR commented 1 year ago

Looks to working well again.

I have a request can you add another column to the table and show the range on nodes, IE /27 or /28 or 10.234.160.224/27 whatever it reports? ( would better 10.234.160.224-245 ) then flag any dups..

As we have some/or had some do not know if they fixed them yet.

Thanks for all your help.

Kevin.

From: Scott Searcy @.>
Sent: Tuesday, April 25, 2023 1:08 PM To: smsearcy/mesh-info
@.> Cc: NA7KR @.>; Mention @.> Subject: Re: [smsearcy/mesh-info] meshinfo-collector.service running but not (Issue #99)

@NA7KR https://github.com/NA7KR are you still seeing issues? After the recent updates, I haven't had any issues with the collector service. I'm not sure exactly why, although I suspect a combination of the different logging library and re-architecting the poller might have addressed it.

— Reply to this email directly, view it on GitHub https://github.com/smsearcy/mesh-info/issues/99#issuecomment-1522351685 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ABP7QGOILUYAW6Q2RU3G5TLXDAVJTANCNFSM6AAAAAAVQS3XSU . You are receiving this because you were mentioned. https://github.com/notifications/beacon/ABP7QGNUKHUEZLMNLQYVVJTXDAVJTA5CNFSM6AAAAAAVQS3XSWWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTS2XU7EK.gif Message ID: @. @.> >

smsearcy commented 1 year ago

Hmm, I might have spoken too soon. It looks like last night my collector stopped working, but this time it wasn't consuming all of the CPU. Restarted, we'll see if it happens again.

smsearcy commented 4 months ago

As mentioned on #122, based on some recent reading, I think this might be related to a edge case with how timeouts work. I'm going to try a different Python library uses a different approach for timeouts, I'm hoping that might fix it.

smsearcy commented 4 months ago

My first idea (trying HTTPX which advertises "Strict timeouts everywhere") didn't work, the collector still froze last night around 1am.

But I've got a few more ideas to try, and I'm trying to replicate the issue on another box to get more data.

smsearcy commented 4 months ago

I might have had some success fixing this. The collector has been running for almost 12 hours which is better than it was doing. If it stays up for the rest of the day then I'll get this code merged into the main branch and let you know so you can upgrade.

smsearcy commented 3 months ago

@NA7KR I think I've fixed the issues with the collector hanging. Please try updating to the latest version and let me know how it goes.

https://mesh-info-ki7onk.readthedocs.io/en/latest/installation.html#upgrading

smsearcy commented 3 months ago

Version 0.7.0 has been running stable for two weeks, so I'm going to close this issue, I think it is fixed.