roflcoopter / viseron

Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.
MIT License
1.51k stars 155 forks source link

Object Detection (EdgeTPU) stops working after running for some time. ("Failed to retrieve result for object_detector", "Frame is 2.082185983657837 seconds old. Discarding") #599

Closed hessijames79 closed 1 year ago

hessijames79 commented 1 year ago

Hi!

I am using Viseron with several cameras, mog2 motion detection, EdgeTPU object detection and DeepStack face recognition. The setup works flawlessly, however, after running for several hours, object detection stops working and reports errors:

[2023-05-10 15:07:18] [DEBUG   ] [viseron.states] - Setting state of sensor.cam_01_stellplatz_object_detector_fps to state: 26.3, attributes {'name': 'Stellplatz Object Detector FPS', 'domain': 'sensor', 'preprocessor_fps': 192.7, 'inference_fps': 26.8, 'theoretical_max_fps': 26.3}
[2023-05-10 15:07:18] [DEBUG   ] [viseron.states] - Setting state of sensor.cam_04_gartenhaus_object_detector_fps to state: 27.0, attributes {'name': 'Gartenhaus Object Detector FPS', 'domain': 'sensor', 'preprocessor_fps': 101.1, 'inference_fps': 27.6, 'theoretical_max_fps': 27.0}
[2023-05-10 15:07:18] [DEBUG   ] [viseron.states] - Setting state of sensor.cam_03_seite_object_detector_fps to state: 31.3, attributes {'name': 'Seite Object Detector FPS', 'domain': 'sensor', 'preprocessor_fps': 146.4, 'inference_fps': 31.9, 'theoretical_max_fps': 31.3}
[2023-05-10 15:07:30] [DEBUG   ] [viseron.components.webserver.websocket_api] - Received {"type":"ping","command_id":249}
[2023-05-10 15:07:34] [DEBUG   ] [viseron.states] - Setting state of binary_sensor.cam_01_stellplatz_motion_detected to state: on, attributes {'name': 'Stellplatz Motion Detected', 'domain': 'binary_sensor'}
[2023-05-10 15:07:34] [DEBUG   ] [viseron.components.nvr.nvr.cam_01_stellplatz] - Starting object detector
[2023-05-10 15:07:34] [DEBUG   ] [viseron.states] - Setting state of sensor.cam_01_stellplatz_operation_state to state: scanning_for_objects, attributes {'name': 'Stellplatz Operation State', 'domain': 'sensor'}
[2023-05-10 15:07:38] [ERROR   ] [viseron.components.nvr.nvr.cam_01_stellplatz] - Failed to retrieve result for object_detector
[2023-05-10 15:07:38] [DEBUG   ] [viseron.components.nvr.nvr.cam_01_stellplatz] - Frame is 2.035888433456421 seconds old. Discarding
[2023-05-10 15:07:38] [DEBUG   ] [viseron.states] - Setting state of sensor.cam_01_stellplatz_operation_state to state: error_scanning_frame, attributes {'name': 'Stellplatz Operation State', 'domain': 'sensor'}
[2023-05-10 15:07:38] [DEBUG   ] [viseron.components.nvr.nvr.cam_01_stellplatz] - Frame is 1.0388662815093994 seconds old. Discarding
[2023-05-10 15:07:41] [ERROR   ] [viseron.components.nvr.nvr.cam_01_stellplatz] - Failed to retrieve result for object_detector
[2023-05-10 15:07:41] [DEBUG   ] [viseron.components.nvr.nvr.cam_01_stellplatz] - Frame is 2.082185983657837 seconds old. Discarding
[2023-05-10 15:07:41] [DEBUG   ] [viseron.components.nvr.nvr.cam_01_stellplatz] - Frame is 1.0659961700439453 seconds old. Discarding
[2023-05-10 15:07:41] [DEBUG   ] [viseron.states] - Setting state of binary_sensor.cam_01_stellplatz_motion_detected to state: off, attributes {'name': 'Stellplatz Motion Detected', 'domain': 'binary_sensor'}
[2023-05-10 15:07:42] [DEBUG   ] [viseron.components.webserver.websocket_api] - Received {"type":"ping","command_id":402}
[2023-05-10 15:07:42] [DEBUG   ] [viseron.components.webserver.websocket_api] - Received {"type":"ping","command_id":403}
[2023-05-10 15:07:44] [ERROR   ] [viseron.components.nvr.nvr.cam_01_stellplatz] - Failed to retrieve result for object_detector
[2023-05-10 15:07:44] [DEBUG   ] [viseron.components.nvr.nvr.cam_01_stellplatz] - Not recording, pausing object detector

Configuration digest:

ffmpeg:
  camera:
    cam_01_stellplatz:
      name: Stellplatz
      host: XXX
      port: 1935
      path: /bcs/channel0_main.bcs?channel=0&stream=0&user=nvr&password=XXX
      protocol: rtmp
      stream_format: rtmp
      width: 2304
      height: 1296
      fps: 4
      username: nvr
      password: XXX
      recorder:
        idle_timeout: 20                    # default: 10
        retain: 30                          # default: 7

mog2:
  motion_detector:
    cameras:
      cam_01_stellplatz:
        fps: 1
        area: 0.01                #default: 0.08
        #trigger_recorder: true

edgetpu:
  object_detector:
    device: usb
    cameras:
      cam_01_stellplatz:
        fps: 1
        labels:
          - label: person
            confidence: 0.7         # default: 0.8
            trigger_recorder: true
          - label: cat
            confidence: 0.7         # default: 0.8
            trigger_recorder: true
          - label: dog
            confidence: 0.7         # default: 0.8
            trigger_recorder: true
          - label: backpack
            confidence: 0.7         # default: 0.8
            trigger_recorder: true

deepstack:
  host: XXX
  port: 5000
  face_recognition:
    save_unknown_faces: true
    cameras:
      cam_01_stellplatz:
    labels:
      - person

nvr:
  cam_01_stellplatz:

Edit: AFAICS the object_detector error and the "Frame is 2.082185983657837 seconds old. Discarding" messages seem to correlate.

I would appreciate any help. Patrick

roflcoopter commented 1 year ago

Hmm, could you redirect the logs from the container to a file? Hopefully we can catch the part when it breaks.

docker logs -f viseron &> /place/to/store/logs/viseron.log &

Viseron doesnt log directly to a file yet, will make sure to fix that soon. Sorry about that!

hessijames79 commented 1 year ago

Hi!

Done. As there is personal information in the logs I had to get creative and sent a download link via reddit.

Btw. I don't see any problems in logging through Docker. Viseron does a good job to customize log contents which is probably all it should do.

Patrick

roflcoopter commented 1 year ago

Thanks! The culprit is probably this error:

F driver/usb/usb_driver.cc:1148] HandleQueuedBulkIn transfer in failed. Not found: USB transfer error 5 [LibUsbDataInCallback]

Googling it gives a few different solutions. What machine are you running on? Some suggestions are a powered USB hub if yo uare running on something like an RPi4

Could also be a bad USB cable apparently. This thread has more details: https://github.com/google-coral/edgetpu/issues/99

hessijames79 commented 1 year ago

Hi!

Thank you for spotting the issue. Will try to rule out the possibilities, probably the cable or the cheap USB3 controller.

Nothing Viseron can solve :)

Patrick

/Edit/Update: In case someone encounters a similar problem.

What didn't work:

/Edit/Update 2: Problem occured again.

hessijames79 commented 1 year ago

Hi! After some experimenting with usb hubs and cables I got the "Failed to retrieve" message again, but this time without any additional USB messages. Any chance you could implement some kind of watchdog to restart the edgetpu part when the problem occurs?

Patrick

roflcoopter commented 1 year ago

Will see if i can find a good way to restart!

hessijames79 commented 10 months ago

Still looking for a way to workaround for the problem. I can detect the problem e. g. through mqtt but can't react in any other way than restarting the whole container. I tried resetting (ioctl USBDEVFS_RESET) and replugging (unbind-sleep-bind) the USB device but the problem is still there.

Do you have any other idea of an action I could trigger from my watchdog to restart the edgetpu part?

Patrick

roflcoopter commented 10 months ago

Sorry its taking me so long, am really busy with the 24/7 recordings (#619) atm I dont think it will take me much longer to finish that

My goal is to finish that one off and then work on issues like this and and polish some existing features