jjpavlik / homemetrics

1 stars 0 forks source link

Counters keep restarting, collector being restarted/crashing #18

Closed jjpavlik closed 4 years ago

jjpavlik commented 4 years ago

Linked to https://github.com/jjpavlik/homemetrics/issues/12 the problem keeps happening.

jjpavlik commented 4 years ago

So:

NameError: name 'Exceprion' is not defined

NameError: name 'receive_message' is not defined

jjpavlik commented 4 years ago

So these two things are addressed in https://github.com/jjpavlik/homemetrics/tree/Issue-%2318 . There's something interesting here though. The second crash (receive_message one) suggests the Arduino sent an error message back! :O need to know what it wast, because I didn't get to see if the ERRORS counter was increased in the LCD. Leaving this branch running for a few days to see if it catches anything

jjpavlik commented 4 years ago

Looks like on the test I ran last night (unplugging the network cable and plugin it back):

However... now collector.py for some reason is steadily producing 20 messages per period, and since pusher is only consuming up to 10 per period, there's a steady growth of 10 messages per period since then:

Screenshot 2020-03-15 at 18 11 15
jjpavlik commented 4 years ago

Collector logs show the following since last night:

2020-03-14 22:05:38,268 - root - INFO - A few measurements queuing locally :O 6 trying to push them now 2020-03-14 22:05:38,792 - root - INFO - A few measurements queuing locally :O 6 trying to push them now 2020-03-14 22:05:39,306 - root - INFO - A few measurements queuing locally :O 6 trying to push them now 2020-03-14 22:05:39,912 - root - INFO - A few measurements queuing locally :O 6 trying to push them now

This suggests the queue is never going back to 0, and it queued 6 measures. I believe the problem is the retry mechanism inside https://github.com/jjpavlik/homemetrics/blob/Issue-%2318/collector.py#L59 . The big issue seems to be the for loop won't remove from the list the measures that were actually pushed and it will indeed re-append (duplicating) the ones that failed during the time the interface went down.

Just pushed commit https://github.com/jjpavlik/homemetrics/commit/8386fe82c44db56f7604795831b73730755d3094#diff-f306c1eaf970996da5b4dddb8261d048

Now the queue size is slowly going down, however I still need to check collector.py can gracefully survive a link down link up scenario.

jjpavlik commented 4 years ago

After 2 days and having tested forcing a link down/up looks like things are on track. Will close this one and see if the problems shows up again in the future.