google / aiyprojects-raspbian

API libraries, samples, and system images for AIY Projects (Voice Kit and Vision Kit)
https://aiyprojects.withgoogle.com/
Apache License 2.0
1.61k stars 692 forks source link

Myriad resets unexpectedly leading to SpicommInternalError #377

Open HeatfanJohn opened 6 years ago

HeatfanJohn commented 6 years ago

The Myriad resets due to Slave not ready after 1000 ms error. This leads to an SpicommInternalError exception and results in the Finally clause to execute in JoyDetector.run causing the application to stop functioning while the service is still running.

An open except and logging was added to trap and display the exception as it occurred.

May 17 10:41:18 aiyvision kernel: [46924.151775] aiy-vision spi0.0: Slave not ready after 1000 ms
May 17 10:41:18 aiyvision kernel: [46924.151816] aiy-vision spi0.0: Fatal error, resetting
May 17 10:41:18 aiyvision kernel: [46924.151829] aiy-vision spi0.0: Resetting myriad
May 17 10:41:21 aiyvision kernel: [46926.741620] aiy-vision spi0.0: Writing myriad firmware
May 17 10:41:27 aiyvision kernel: [46932.656598] aiy-vision spi0.0: Myriad booting
May 17 10:41:27 aiyvision kernel: [46932.837788] aiy-vision spi0.0: Myriad ready
May 17 10:41:27 aiyvision python3[239]: ERROR:__main__:Unexpected exception encountered: <class 'picamera.exc.PiCameraRuntimeError'>
May 17 10:41:27 aiyvision python3[239]: ERROR:__main__:Traceback (most recent call last):
May 17 10:41:27 aiyvision python3[239]:   File "/opt/aiy/projects-python/src/aiy/_drivers/_spicomm.py", line 137, in transact
May 17 10:41:27 aiyvision python3[239]:     fcntl.ioctl(self._dev, SPICOMM_IOCTL_TRANSACT, buf)
May 17 10:41:27 aiyvision python3[239]: OSError: [Errno 14] Bad address
May 17 10:41:27 aiyvision python3[239]: During handling of the above exception, another exception occurred:
May 17 10:41:27 aiyvision python3[239]: Traceback (most recent call last):
May 17 10:41:27 aiyvision python3[239]:   File "/home/pi/AIY-projects-python/src/examples/vision/joy/joy_detection_demo.py", line 278, in run
May 17 10:41:27 aiyvision python3[239]:     for i, result in enumerate(inference.run()):
May 17 10:41:27 aiyvision python3[239]:   File "/opt/aiy/projects-python/src/aiy/vision/inference.py", line 76, in run
May 17 10:41:27 aiyvision python3[239]:     yield self._engine.camera_inference()
May 17 10:41:27 aiyvision python3[239]:   File "/opt/aiy/projects-python/src/aiy/vision/inference.py", line 257, in camera_inference
May 17 10:41:27 aiyvision python3[239]:     return self._communicate(request).inference_result
May 17 10:41:27 aiyvision python3[239]:   File "/opt/aiy/projects-python/src/aiy/vision/inference.py", line 187, in _communicate
May 17 10:41:27 aiyvision python3[239]:     response.ParseFromString(self._transport.send(request.SerializeToString()))
May 17 10:41:27 aiyvision python3[239]:   File "/opt/aiy/projects-python/src/aiy/_drivers/_transport.py", line 33, in send
May 17 10:41:27 aiyvision python3[239]:     return self._spicomm.transact(request)
May 17 10:41:27 aiyvision python3[239]:   File "/opt/aiy/projects-python/src/aiy/_drivers/_spicomm.py", line 152, in transact
May 17 10:41:27 aiyvision python3[239]:     raise SpicommInternalError
May 17 10:41:27 aiyvision python3[239]: aiy._drivers._spicomm.SpicommInternalError
May 17 10:41:27 aiyvision python3[239]: During handling of the above exception, another exception occurred:
May 17 10:41:27 aiyvision python3[239]: Traceback (most recent call last):
May 17 10:41:27 aiyvision python3[239]:   File "/home/pi/AIY-projects-python/src/examples/vision/joy/joy_detection_demo.py", line 293, in run
May 17 10:41:27 aiyvision python3[239]:     break
May 17 10:41:27 aiyvision python3[239]:   File "/opt/aiy/projects-python/src/aiy/vision/inference.py", line 87, in __exit__
May 17 10:41:27 aiyvision python3[239]:     self.close()
May 17 10:41:27 aiyvision python3[239]:   File "/opt/aiy/projects-python/src/aiy/vision/inference.py", line 79, in close
May 17 10:41:27 aiyvision python3[239]:     self._engine.stop_camera_inference()
May 17 10:41:27 aiyvision python3[239]:   File "/opt/aiy/projects-python/src/aiy/vision/inference.py", line 263, in stop_camera_inference
May 17 10:41:27 aiyvision python3[239]:     self._communicate(request)
May 17 10:41:27 aiyvision python3[239]:   File "/opt/aiy/projects-python/src/aiy/vision/inference.py", line 189, in _communicate
May 17 10:41:27 aiyvision python3[239]:     raise InferenceException(response.status.message)
May 17 10:41:27 aiyvision python3[239]: aiy.vision.inference.InferenceException: StopCameraInference: Camera inference is not running.
May 17 10:41:27 aiyvision python3[239]: During handling of the above exception, another exception occurred:
May 17 10:41:27 aiyvision python3[239]: Traceback (most recent call last):
May 17 10:41:27 aiyvision python3[239]:   File "/home/pi/AIY-projects-python/src/examples/vision/joy/joy_detection_demo.py", line 293, in run
May 17 10:41:27 aiyvision python3[239]:     break
May 17 10:41:27 aiyvision python3[239]:   File "/usr/lib/python3/dist-packages/picamera/camera.py", line 758, in __exit__
May 17 10:41:27 aiyvision python3[239]:     self.close()
May 17 10:41:27 aiyvision python3[239]:   File "/usr/lib/python3/dist-packages/picamera/camera.py", line 752, in close
May 17 10:41:27 aiyvision python3[239]:     raise exc
May 17 10:41:27 aiyvision python3[239]:   File "/usr/lib/python3/dist-packages/picamera/camera.py", line 1995, in _control_callback
May 17 10:41:27 aiyvision python3[239]:     "No data recevied from sensor. Check all connections, "
May 17 10:41:27 aiyvision python3[239]: picamera.exc.PiCameraRuntimeError: No data recevied from sensor. Check all connections, including the SUNNY chip on the camera board
May 17 10:41:27 aiyvision python3[239]: INFO:__main__:JoyDetector: finnally called
May 17 11:17:01 aiyvision CRON[1645]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
May 17 12:17:01 aiyvision CRON[1675]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
dmitriykovalev commented 6 years ago

There is already a similar issue #346. How often do you get this error?

HeatfanJohn commented 6 years ago

It's starting to happen a couple of times a day. Lately I have seen this error to include a Failed to write header: -85 error log entry. See below.

Does the kernel driver need to wait longer than 1 second? Or is there a way to reset without throwing an exception?

May 21 07:36:52 aiyvision kernel: [381455.009181] aiy-vision spi0.0: Slave not ready after 1000 ms
May 21 07:36:52 aiyvision kernel: [381455.009211] aiy-vision spi0.0: Failed to write header: -85
May 21 07:36:52 aiyvision kernel: [381455.009225] aiy-vision spi0.0: Fatal error, resetting
May 21 07:36:52 aiyvision kernel: [381455.009264] aiy-vision spi0.0: Resetting myriad
May 21 07:36:54 aiyvision kernel: [381457.198581] aiy-vision spi0.0: Writing myriad firmware
May 21 07:37:00 aiyvision kernel: [381463.639216] aiy-vision spi0.0: Myriad booting
May 21 07:37:01 aiyvision kernel: [381463.820407] aiy-vision spi0.0: Myriad ready
dmitriykovalev commented 6 years ago

It's possible to implement some recovery logic on the python side but we should fix the root cause, i.e. driver shouldn't fail. Could you try to enable debug in the vision driver by running:

modprobe -r aiy-vision && modprobe aiy-vision debug=1

and then get dmesg output when it fails again. It's going to be a lot messages in the log. To disable debugging mode simply run:

modprobe -r aiy-vision && modprobe aiy-vision

By the way, do you have Wi-Fi or HDMI working while you are running JoyDetector?

HeatfanJohn commented 6 years ago

I'm using Wi-Fi ... I don't have an HDMI monitor attached but I could do that.

I will do the debug as request. The last failure before starting to debug included some interesting Kernel message about a hung worker. I wasn't able to systemctl stop the service so I reboot. I'm going to guess that the hung worker means that the kernel driver is hung.

[442296.473614] aiy-vision spi0.0: Incoming crc mismatch: slave 8353487a vs computed f45478ec
[445299.665640] aiy-vision spi0.0: Incoming crc mismatch: slave 694a5014 vs computed 1e4d6082
[447935.853262] aiy-vision spi0.0: Slave not ready after 1000 ms
[447935.853292] aiy-vision spi0.0: Failed to write header: -85
[447935.853304] aiy-vision spi0.0: Fatal error, resetting
[447935.853338] aiy-vision spi0.0: Resetting myriad
[447938.039908] aiy-vision spi0.0: Writing myriad firmware
[447944.065530] aiy-vision spi0.0: Myriad booting
[447944.246708] aiy-vision spi0.0: Myriad ready
[448137.446608] INFO: task kworker/0:2:7698 blocked for more than 120 seconds.
[448137.446632]       Tainted: G           O    4.14.34+ #1110
[448137.446640] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[448137.446649] kworker/0:2     D    0  7698      2 0x00000000
[448137.446706] Workqueue: events dbs_work_handler
[448137.446763] [<c06455a8>] (__schedule) from [<c0645a78>] (schedule+0x50/0xa8)
[448137.446787] [<c0645a78>] (schedule) from [<c0648df0>] (schedule_timeout+0x1a0/0x358)
[448137.446808] [<c0648df0>] (schedule_timeout) from [<c064657c>] (wait_for_common+0xa0/0x154)
[448137.446831] [<c064657c>] (wait_for_common) from [<c0646650>] (wait_for_completion+0x20/0x24)
[448137.446856] [<c0646650>] (wait_for_completion) from [<c04ec224>] (rpi_firmware_transaction+0x60/0xa8)
[448137.446876] [<c04ec224>] (rpi_firmware_transaction) from [<c04ec354>] (rpi_firmware_property_list+0xe8/0x210)
[448137.446893] [<c04ec354>] (rpi_firmware_property_list) from [<c04ec4e0>] (rpi_firmware_property+0x64/0x84)
[448137.446919] [<c04ec4e0>] (rpi_firmware_property) from [<c04c6bdc>] (bcm2835_cpufreq_clock_property.constprop.1+0x48/0x5c)
[448137.446945] [<c04c6bdc>] (bcm2835_cpufreq_clock_property.constprop.1) from [<c04c6c30>] (bcm2835_cpufreq_driver_target_index+0x40/0xb4)
[448137.446967] [<c04c6c30>] (bcm2835_cpufreq_driver_target_index) from [<c04c2b98>] (__cpufreq_driver_target+0x22c/0x548)
[448137.446988] [<c04c2b98>] (__cpufreq_driver_target) from [<c04c5770>] (od_dbs_update+0xa8/0x170)
[448137.447010] [<c04c5770>] (od_dbs_update) from [<c04c6244>] (dbs_work_handler+0x38/0x60)
[448137.447037] [<c04c6244>] (dbs_work_handler) from [<c0039208>] (process_one_work+0x11c/0x398)
[448137.447056] [<c0039208>] (process_one_work) from [<c00394c0>] (worker_thread+0x3c/0x544)
[448137.447082] [<c00394c0>] (worker_thread) from [<c003f3bc>] (kthread+0x120/0x15c)
[448137.447113] [<c003f3bc>] (kthread) from [<c000fe6c>] (ret_from_fork+0x14/0x28)
[448137.447128] INFO: task kworker/0:1:7709 blocked for more than 120 seconds.
[448137.447136]       Tainted: G           O    4.14.34+ #1110
[448137.447141] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[448137.447146] kworker/0:1     D    0  7709      2 0x00000000
[448137.447167] Workqueue: events get_throttled_poll
[448137.447196] [<c06455a8>] (__schedule) from [<c0645a78>] (schedule+0x50/0xa8)
[448137.447219] [<c0645a78>] (schedule) from [<c0645d94>] (schedule_preempt_disabled+0x18/0x1c)
[448137.447244] [<c0645d94>] (schedule_preempt_disabled) from [<c0647304>] (__mutex_lock.constprop.3+0x118/0x358)
[448137.447269] [<c0647304>] (__mutex_lock.constprop.3) from [<c064765c>] (__mutex_lock_slowpath+0x1c/0x20)
[448137.447291] [<c064765c>] (__mutex_lock_slowpath) from [<c06476b8>] (mutex_lock+0x58/0x5c)
[448137.447311] [<c06476b8>] (mutex_lock) from [<c04ec204>] (rpi_firmware_transaction+0x40/0xa8)
[448137.447329] [<c04ec204>] (rpi_firmware_transaction) from [<c04ec354>] (rpi_firmware_property_list+0xe8/0x210)
[448137.447346] [<c04ec354>] (rpi_firmware_property_list) from [<c04ec4e0>] (rpi_firmware_property+0x64/0x84)
[448137.447363] [<c04ec4e0>] (rpi_firmware_property) from [<c04ec634>] (rpi_firmware_get_throttled+0xf4/0x17c)
[448137.447380] [<c04ec634>] (rpi_firmware_get_throttled) from [<c04ec750>] (get_throttled_poll+0x28/0x4c)
[448137.447398] [<c04ec750>] (get_throttled_poll) from [<c0039208>] (process_one_work+0x11c/0x398)
[448137.447416] [<c0039208>] (process_one_work) from [<c00394c0>] (worker_thread+0x3c/0x544)
[448137.447435] [<c00394c0>] (worker_thread) from [<c003f3bc>] (kthread+0x120/0x15c)
[448137.447458] [<c003f3bc>] (kthread) from [<c000fe6c>] (ret_from_fork+0x14/0x28)
[448260.323096] INFO: task kworker/0:2:7698 blocked for more than 120 seconds.
[448260.323121]       Tainted: G           O    4.14.34+ #1110
[448260.323127] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[448260.323137] kworker/0:2     D    0  7698      2 0x00000000
[448260.323194] Workqueue: events dbs_work_handler
[448260.323251] [<c06455a8>] (__schedule) from [<c0645a78>] (schedule+0x50/0xa8)
[448260.323274] [<c0645a78>] (schedule) from [<c0648df0>] (schedule_timeout+0x1a0/0x358)
[448260.323296] [<c0648df0>] (schedule_timeout) from [<c064657c>] (wait_for_common+0xa0/0x154)
[448260.323320] [<c064657c>] (wait_for_common) from [<c0646650>] (wait_for_completion+0x20/0x24)
[448260.323344] [<c0646650>] (wait_for_completion) from [<c04ec224>] (rpi_firmware_transaction+0x60/0xa8)
[448260.323365] [<c04ec224>] (rpi_firmware_transaction) from [<c04ec354>] (rpi_firmware_property_list+0xe8/0x210)
[448260.323382] [<c04ec354>] (rpi_firmware_property_list) from [<c04ec4e0>] (rpi_firmware_property+0x64/0x84)
[448260.323407] [<c04ec4e0>] (rpi_firmware_property) from [<c04c6bdc>] (bcm2835_cpufreq_clock_property.constprop.1+0x48/0x5c)
[448260.323432] [<c04c6bdc>] (bcm2835_cpufreq_clock_property.constprop.1) from [<c04c6c30>] (bcm2835_cpufreq_driver_target_index+0x40/0xb4)
[448260.323455] [<c04c6c30>] (bcm2835_cpufreq_driver_target_index) from [<c04c2b98>] (__cpufreq_driver_target+0x22c/0x548)
[448260.323476] [<c04c2b98>] (__cpufreq_driver_target) from [<c04c5770>] (od_dbs_update+0xa8/0x170)
[448260.323497] [<c04c5770>] (od_dbs_update) from [<c04c6244>] (dbs_work_handler+0x38/0x60)
[448260.323523] [<c04c6244>] (dbs_work_handler) from [<c0039208>] (process_one_work+0x11c/0x398)
[448260.323542] [<c0039208>] (process_one_work) from [<c00394c0>] (worker_thread+0x3c/0x544)
[448260.323568] [<c00394c0>] (worker_thread) from [<c003f3bc>] (kthread+0x120/0x15c)
[448260.323599] [<c003f3bc>] (kthread) from [<c000fe6c>] (ret_from_fork+0x14/0x28)
[448260.323614] INFO: task kworker/0:1:7709 blocked for more than 120 seconds.
[448260.323623]       Tainted: G           O    4.14.34+ #1110
[448260.323628] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[448260.323633] kworker/0:1     D    0  7709      2 0x00000000
[448260.323655] Workqueue: events get_throttled_poll
[448260.323685] [<c06455a8>] (__schedule) from [<c0645a78>] (schedule+0x50/0xa8)
[448260.323706] [<c0645a78>] (schedule) from [<c0645d94>] (schedule_preempt_disabled+0x18/0x1c)
[448260.323731] [<c0645d94>] (schedule_preempt_disabled) from [<c0647304>] (__mutex_lock.constprop.3+0x118/0x358)
[448260.323755] [<c0647304>] (__mutex_lock.constprop.3) from [<c064765c>] (__mutex_lock_slowpath+0x1c/0x20)
[448260.323776] [<c064765c>] (__mutex_lock_slowpath) from [<c06476b8>] (mutex_lock+0x58/0x5c)
[448260.323796] [<c06476b8>] (mutex_lock) from [<c04ec204>] (rpi_firmware_transaction+0x40/0xa8)
[448260.323814] [<c04ec204>] (rpi_firmware_transaction) from [<c04ec354>] (rpi_firmware_property_list+0xe8/0x210)
[448260.323831] [<c04ec354>] (rpi_firmware_property_list) from [<c04ec4e0>] (rpi_firmware_property+0x64/0x84)
[448260.323848] [<c04ec4e0>] (rpi_firmware_property) from [<c04ec634>] (rpi_firmware_get_throttled+0xf4/0x17c)
[448260.323865] [<c04ec634>] (rpi_firmware_get_throttled) from [<c04ec750>] (get_throttled_poll+0x28/0x4c)
[448260.323884] [<c04ec750>] (get_throttled_poll) from [<c0039208>] (process_one_work+0x11c/0x398)
[448260.323901] [<c0039208>] (process_one_work) from [<c00394c0>] (worker_thread+0x3c/0x544)
[448260.323920] [<c00394c0>] (worker_thread) from [<c003f3bc>] (kthread+0x120/0x15c)
[448260.323942] [<c003f3bc>] (kthread) from [<c000fe6c>] (ret_from_fork+0x14/0x28)
[448383.199956] INFO: task kworker/0:2:7698 blocked for more than 120 seconds.
[448383.199982]       Tainted: G           O    4.14.34+ #1110
[448383.199989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[448383.199999] kworker/0:2     D    0  7698      2 0x00000000
[448383.200054] Workqueue: events dbs_work_handler
[448383.200113] [<c06455a8>] (__schedule) from [<c0645a78>] (schedule+0x50/0xa8)
[448383.200134] [<c0645a78>] (schedule) from [<c0648df0>] (schedule_timeout+0x1a0/0x358)
[448383.200157] [<c0648df0>] (schedule_timeout) from [<c064657c>] (wait_for_common+0xa0/0x154)
[448383.200179] [<c064657c>] (wait_for_common) from [<c0646650>] (wait_for_completion+0x20/0x24)
[448383.200205] [<c0646650>] (wait_for_completion) from [<c04ec224>] (rpi_firmware_transaction+0x60/0xa8)
[448383.200225] [<c04ec224>] (rpi_firmware_transaction) from [<c04ec354>] (rpi_firmware_property_list+0xe8/0x210)
[448383.200243] [<c04ec354>] (rpi_firmware_property_list) from [<c04ec4e0>] (rpi_firmware_property+0x64/0x84)
[448383.200268] [<c04ec4e0>] (rpi_firmware_property) from [<c04c6bdc>] (bcm2835_cpufreq_clock_property.constprop.1+0x48/0x5c)
[448383.200294] [<c04c6bdc>] (bcm2835_cpufreq_clock_property.constprop.1) from [<c04c6c30>] (bcm2835_cpufreq_driver_target_index+0x40/0xb4)
[448383.200317] [<c04c6c30>] (bcm2835_cpufreq_driver_target_index) from [<c04c2b98>] (__cpufreq_driver_target+0x22c/0x548)
[448383.200338] [<c04c2b98>] (__cpufreq_driver_target) from [<c04c5770>] (od_dbs_update+0xa8/0x170)
[448383.200358] [<c04c5770>] (od_dbs_update) from [<c04c6244>] (dbs_work_handler+0x38/0x60)
[448383.200384] [<c04c6244>] (dbs_work_handler) from [<c0039208>] (process_one_work+0x11c/0x398)
[448383.200403] [<c0039208>] (process_one_work) from [<c00394c0>] (worker_thread+0x3c/0x544)
[448383.200430] [<c00394c0>] (worker_thread) from [<c003f3bc>] (kthread+0x120/0x15c)
[448383.200464] [<c003f3bc>] (kthread) from [<c000fe6c>] (ret_from_fork+0x14/0x28)
[448383.200480] INFO: task kworker/0:1:7709 blocked for more than 120 seconds.
[448383.200488]       Tainted: G           O    4.14.34+ #1110
[448383.200493] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[448383.200499] kworker/0:1     D    0  7709      2 0x00000000
[448383.200521] Workqueue: events get_throttled_poll
[448383.200550] [<c06455a8>] (__schedule) from [<c0645a78>] (schedule+0x50/0xa8)
[448383.200571] [<c0645a78>] (schedule) from [<c0645d94>] (schedule_preempt_disabled+0x18/0x1c)
[448383.200596] [<c0645d94>] (schedule_preempt_disabled) from [<c0647304>] (__mutex_lock.constprop.3+0x118/0x358)
[448383.200620] [<c0647304>] (__mutex_lock.constprop.3) from [<c064765c>] (__mutex_lock_slowpath+0x1c/0x20)
[448383.200642] [<c064765c>] (__mutex_lock_slowpath) from [<c06476b8>] (mutex_lock+0x58/0x5c)
[448383.200662] [<c06476b8>] (mutex_lock) from [<c04ec204>] (rpi_firmware_transaction+0x40/0xa8)
[448383.200680] [<c04ec204>] (rpi_firmware_transaction) from [<c04ec354>] (rpi_firmware_property_list+0xe8/0x210)
[448383.200697] [<c04ec354>] (rpi_firmware_property_list) from [<c04ec4e0>] (rpi_firmware_property+0x64/0x84)
[448383.200715] [<c04ec4e0>] (rpi_firmware_property) from [<c04ec634>] (rpi_firmware_get_throttled+0xf4/0x17c)
[448383.200732] [<c04ec634>] (rpi_firmware_get_throttled) from [<c04ec750>] (get_throttled_poll+0x28/0x4c)
[448383.200750] [<c04ec750>] (get_throttled_poll) from [<c0039208>] (process_one_work+0x11c/0x398)
[448383.200768] [<c0039208>] (process_one_work) from [<c00394c0>] (worker_thread+0x3c/0x544)
[448383.200788] [<c00394c0>] (worker_thread) from [<c003f3bc>] (kthread+0x120/0x15c)
[448383.200810] [<c003f3bc>] (kthread) from [<c000fe6c>] (ret_from_fork+0x14/0x28)
[448506.077079] INFO: task kworker/0:2:7698 blocked for more than 120 seconds.
[448506.077103]       Tainted: G           O    4.14.34+ #1110
[448506.077109] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[448506.077119] kworker/0:2     D    0  7698      2 0x00000000
[448506.077176] Workqueue: events dbs_work_handler
[448506.077232] [<c06455a8>] (__schedule) from [<c0645a78>] (schedule+0x50/0xa8)
[448506.077256] [<c0645a78>] (schedule) from [<c0648df0>] (schedule_timeout+0x1a0/0x358)
[448506.077278] [<c0648df0>] (schedule_timeout) from [<c064657c>] (wait_for_common+0xa0/0x154)
[448506.077302] [<c064657c>] (wait_for_common) from [<c0646650>] (wait_for_completion+0x20/0x24)
[448506.077327] [<c0646650>] (wait_for_completion) from [<c04ec224>] (rpi_firmware_transaction+0x60/0xa8)
[448506.077347] [<c04ec224>] (rpi_firmware_transaction) from [<c04ec354>] (rpi_firmware_property_list+0xe8/0x210)
[448506.077364] [<c04ec354>] (rpi_firmware_property_list) from [<c04ec4e0>] (rpi_firmware_property+0x64/0x84)
[448506.077389] [<c04ec4e0>] (rpi_firmware_property) from [<c04c6bdc>] (bcm2835_cpufreq_clock_property.constprop.1+0x48/0x5c)
[448506.077415] [<c04c6bdc>] (bcm2835_cpufreq_clock_property.constprop.1) from [<c04c6c30>] (bcm2835_cpufreq_driver_target_index+0x40/0xb4)
[448506.077438] [<c04c6c30>] (bcm2835_cpufreq_driver_target_index) from [<c04c2b98>] (__cpufreq_driver_target+0x22c/0x548)
[448506.077459] [<c04c2b98>] (__cpufreq_driver_target) from [<c04c5770>] (od_dbs_update+0xa8/0x170)
[448506.077480] [<c04c5770>] (od_dbs_update) from [<c04c6244>] (dbs_work_handler+0x38/0x60)
[448506.077506] [<c04c6244>] (dbs_work_handler) from [<c0039208>] (process_one_work+0x11c/0x398)
[448506.077525] [<c0039208>] (process_one_work) from [<c00394c0>] (worker_thread+0x3c/0x544)
[448506.077551] [<c00394c0>] (worker_thread) from [<c003f3bc>] (kthread+0x120/0x15c)
[448506.077582] [<c003f3bc>] (kthread) from [<c000fe6c>] (ret_from_fork+0x14/0x28)
[448506.077597] INFO: task kworker/0:1:7709 blocked for more than 120 seconds.
[448506.077605]       Tainted: G           O    4.14.34+ #1110
[448506.077610] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[448506.077615] kworker/0:1     D    0  7709      2 0x00000000
[448506.077637] Workqueue: events get_throttled_poll
[448506.077667] [<c06455a8>] (__schedule) from [<c0645a78>] (schedule+0x50/0xa8)
[448506.077688] [<c0645a78>] (schedule) from [<c0645d94>] (schedule_preempt_disabled+0x18/0x1c)
[448506.077713] [<c0645d94>] (schedule_preempt_disabled) from [<c0647304>] (__mutex_lock.constprop.3+0x118/0x358)
[448506.077737] [<c0647304>] (__mutex_lock.constprop.3) from [<c064765c>] (__mutex_lock_slowpath+0x1c/0x20)
[448506.077759] [<c064765c>] (__mutex_lock_slowpath) from [<c06476b8>] (mutex_lock+0x58/0x5c)
[448506.077780] [<c06476b8>] (mutex_lock) from [<c04ec204>] (rpi_firmware_transaction+0x40/0xa8)
[448506.077799] [<c04ec204>] (rpi_firmware_transaction) from [<c04ec354>] (rpi_firmware_property_list+0xe8/0x210)
[448506.077816] [<c04ec354>] (rpi_firmware_property_list) from [<c04ec4e0>] (rpi_firmware_property+0x64/0x84)
[448506.077833] [<c04ec4e0>] (rpi_firmware_property) from [<c04ec634>] (rpi_firmware_get_throttled+0xf4/0x17c)
[448506.077850] [<c04ec634>] (rpi_firmware_get_throttled) from [<c04ec750>] (get_throttled_poll+0x28/0x4c)
[448506.077868] [<c04ec750>] (get_throttled_poll) from [<c0039208>] (process_one_work+0x11c/0x398)
[448506.077885] [<c0039208>] (process_one_work) from [<c00394c0>] (worker_thread+0x3c/0x544)
[448506.077906] [<c00394c0>] (worker_thread) from [<c003f3bc>] (kthread+0x120/0x15c)
[448506.077928] [<c003f3bc>] (kthread) from [<c000fe6c>] (ret_from_fork+0x14/0x28)
[448628.954396] INFO: task kworker/0:2:7698 blocked for more than 120 seconds.
[448628.954421]       Tainted: G           O    4.14.34+ #1110
[448628.954427] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[448628.954437] kworker/0:2     D    0  7698      2 0x00000000
[448628.954492] Workqueue: events dbs_work_handler
[448628.954548] [<c06455a8>] (__schedule) from [<c0645a78>] (schedule+0x50/0xa8)
[448628.954571] [<c0645a78>] (schedule) from [<c0648df0>] (schedule_timeout+0x1a0/0x358)
[448628.954592] [<c0648df0>] (schedule_timeout) from [<c064657c>] (wait_for_common+0xa0/0x154)
[448628.954615] [<c064657c>] (wait_for_common) from [<c0646650>] (wait_for_completion+0x20/0x24)
[448628.954640] [<c0646650>] (wait_for_completion) from [<c04ec224>] (rpi_firmware_transaction+0x60/0xa8)
[448628.954659] [<c04ec224>] (rpi_firmware_transaction) from [<c04ec354>] (rpi_firmware_property_list+0xe8/0x210)
[448628.954676] [<c04ec354>] (rpi_firmware_property_list) from [<c04ec4e0>] (rpi_firmware_property+0x64/0x84)
[448628.954703] [<c04ec4e0>] (rpi_firmware_property) from [<c04c6bdc>] (bcm2835_cpufreq_clock_property.constprop.1+0x48/0x5c)
[448628.954728] [<c04c6bdc>] (bcm2835_cpufreq_clock_property.constprop.1) from [<c04c6c30>] (bcm2835_cpufreq_driver_target_index+0x40/0xb4)
[448628.954750] [<c04c6c30>] (bcm2835_cpufreq_driver_target_index) from [<c04c2b98>] (__cpufreq_driver_target+0x22c/0x548)
[448628.954772] [<c04c2b98>] (__cpufreq_driver_target) from [<c04c5770>] (od_dbs_update+0xa8/0x170)
[448628.954793] [<c04c5770>] (od_dbs_update) from [<c04c6244>] (dbs_work_handler+0x38/0x60)
[448628.954819] [<c04c6244>] (dbs_work_handler) from [<c0039208>] (process_one_work+0x11c/0x398)
[448628.954838] [<c0039208>] (process_one_work) from [<c00394c0>] (worker_thread+0x3c/0x544)
[448628.954865] [<c00394c0>] (worker_thread) from [<c003f3bc>] (kthread+0x120/0x15c)
[448628.954898] [<c003f3bc>] (kthread) from [<c000fe6c>] (ret_from_fork+0x14/0x28)
[448628.954912] INFO: task kworker/0:1:7709 blocked for more than 120 seconds.
[448628.954921]       Tainted: G           O    4.14.34+ #1110
[448628.954926] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[448628.954931] kworker/0:1     D    0  7709      2 0x00000000
[448628.954952] Workqueue: events get_throttled_poll
[448628.954982] [<c06455a8>] (__schedule) from [<c0645a78>] (schedule+0x50/0xa8)
[448628.955004] [<c0645a78>] (schedule) from [<c0645d94>] (schedule_preempt_disabled+0x18/0x1c)
[448628.955030] [<c0645d94>] (schedule_preempt_disabled) from [<c0647304>] (__mutex_lock.constprop.3+0x118/0x358)
[448628.955053] [<c0647304>] (__mutex_lock.constprop.3) from [<c064765c>] (__mutex_lock_slowpath+0x1c/0x20)
[448628.955075] [<c064765c>] (__mutex_lock_slowpath) from [<c06476b8>] (mutex_lock+0x58/0x5c)
[448628.955095] [<c06476b8>] (mutex_lock) from [<c04ec204>] (rpi_firmware_transaction+0x40/0xa8)
[448628.955112] [<c04ec204>] (rpi_firmware_transaction) from [<c04ec354>] (rpi_firmware_property_list+0xe8/0x210)
[448628.955129] [<c04ec354>] (rpi_firmware_property_list) from [<c04ec4e0>] (rpi_firmware_property+0x64/0x84)
[448628.955147] [<c04ec4e0>] (rpi_firmware_property) from [<c04ec634>] (rpi_firmware_get_throttled+0xf4/0x17c)
[448628.955163] [<c04ec634>] (rpi_firmware_get_throttled) from [<c04ec750>] (get_throttled_poll+0x28/0x4c)
[448628.955182] [<c04ec750>] (get_throttled_poll) from [<c0039208>] (process_one_work+0x11c/0x398)
[448628.955199] [<c0039208>] (process_one_work) from [<c00394c0>] (worker_thread+0x3c/0x544)
[448628.955218] [<c00394c0>] (worker_thread) from [<c003f3bc>] (kthread+0x120/0x15c)
[448628.955240] [<c003f3bc>] (kthread) from [<c000fe6c>] (ret_from_fork+0x14/0x28)
pi@aiyvision:~ $ 
HeatfanJohn commented 6 years ago

Finally happened while debug was set. I had to disable all Kernel logging in rsyslog.conf to prevent total depletion of available storage. Kernel messages were still available using dmesg.

Here are the debug messages before the timeout:

[444653.534091] aiy-vision spi0.0: Polling for completed transaction
[444653.534108] aiy-vision spi0.0: Sending initial header
[444653.534117] aiy-vision spi0.0: Waiting before write.
[444653.534127] aiy-vision spi0.0: Done waiting, writing.
[444653.534189] aiy-vision spi0.0: Spi write complete.
[444653.534215] aiy-vision spi0.0: Waiting before read.
[444653.534222] aiy-vision spi0.0: Done waiting, reading.
[444653.534258] aiy-vision spi0.0: Recieved header: 07 size 0 crc c651 tid 0
[444653.534267] aiy-vision spi0.0: header_exchange succesful
[444653.534276] aiy-vision spi0.0: Scheduling poll
[444653.534291] aiy-vision spi0.0: Pending tid(s) =
[444653.534298] aiy-vision spi0.0: 1
[444653.534304] aiy-vision spi0.0:
[444653.564071] aiy-vision spi0.0: Polling for completed transaction
[444653.564089] aiy-vision spi0.0: Sending initial header
[444653.564098] aiy-vision spi0.0: Waiting before write.
[444653.564108] aiy-vision spi0.0: Done waiting, writing.
[444653.564171] aiy-vision spi0.0: Spi write complete.
[444653.564179] aiy-vision spi0.0: Waiting before read.
[444653.564292] aiy-vision spi0.0: Done waiting, reading.
[444653.564345] aiy-vision spi0.0: Recieved header: 2f size 134 crc aeeb tid 1
[444653.564355] aiy-vision spi0.0: header_exchange succesful
[444653.564364] aiy-vision spi0.0: tid 1 complete
[444653.564372] aiy-vision spi0.0: Slave has a response for tid 1, reading.
[444653.564382] aiy-vision spi0.0: receive_data_buffer of size 134, buffer_len 12582928
[444653.564389] aiy-vision spi0.0: Waiting before read.
[444653.564435] aiy-vision spi0.0: Done waiting, reading.
[444653.564589] aiy-vision spi0.0: Waiting before read.
[444653.565033] aiy-vision spi0.0: Done waiting, reading.
[444653.565117] aiy-vision spi0.0: receive_data_buffer succesful
[444653.565208] aiy-vision spi0.0: Freeing tid 1
[444653.586753] aiy-vision spi0.0: visionbonnet_ioctl cmd=0xc0108903, arg=2921414664
[444653.586786] aiy-vision spi0.0: Assigning tid 1
[444653.595102] aiy-vision spi0.0: processing tid 1
[444653.595120] aiy-vision spi0.0: Sending initial header
[444653.595127] aiy-vision spi0.0: Waiting before write.
[444653.595137] aiy-vision spi0.0: Done waiting, writing.
[444653.595200] aiy-vision spi0.0: Spi write complete.
[444653.595208] aiy-vision spi0.0: Waiting before read.
[444653.595241] aiy-vision spi0.0: Done waiting, reading.
[444653.595276] aiy-vision spi0.0: Recieved header: 07 size 2 crc 8168 tid 1
[444653.595285] aiy-vision spi0.0: header_exchange succesful
[444653.595295] aiy-vision spi0.0: send_data_buffer of size 2
[444653.595309] aiy-vision spi0.0: Waiting before write.
[444653.595316] aiy-vision spi0.0: Done waiting, writing.
[444653.595344] aiy-vision spi0.0: Spi write complete.
[444653.595352] aiy-vision spi0.0: Data sent, sending crc[444653.595359] aiy-vision spi0.0: Waiting before write.
[444653.595365] aiy-vision spi0.0: Done waiting, writing.
[444653.595385] aiy-vision spi0.0: Spi write complete.
[444653.595393] aiy-vision spi0.0: Reading crc packet
[444653.595399] aiy-vision spi0.0: Waiting before read.
[444653.595451] aiy-vision spi0.0: Done waiting, reading.
[444653.595490] aiy-vision spi0.0: send_data_buffer succesful
[444653.595505] aiy-vision spi0.0: Data sent. tid 1 complete 0 is_supported 1 has_data 0 size 0
[444653.595523] aiy-vision spi0.0: Slave has no response, deferring tid 1 to ongoing queue
[444653.595531] aiy-vision spi0.0: Pending tid(s) =
[444653.595537] aiy-vision spi0.0: 1
[444653.595543] aiy-vision spi0.0:
[444653.595568] aiy-vision spi0.0: Polling for completed transaction
[444653.595576] aiy-vision spi0.0: Sending initial header
[444653.595582] aiy-vision spi0.0: Waiting before write.
[444653.595588] aiy-vision spi0.0: Done waiting, writing.
[444653.595616] aiy-vision spi0.0: Spi write complete.
[444653.595633] aiy-vision spi0.0: Waiting before read.
[444653.595640] aiy-vision spi0.0: Done waiting, reading.
[444653.595668] aiy-vision spi0.0: Recieved header: 07 size 0 crc c651 tid 0
[444653.595676] aiy-vision spi0.0: header_exchange succesful
[444653.595683] aiy-vision spi0.0: Scheduling poll
[444653.595697] aiy-vision spi0.0: Pending tid(s) =
[444653.595703] aiy-vision spi0.0: 1
[444653.595709] aiy-vision spi0.0:
[444653.624077] aiy-vision spi0.0: Polling for completed transaction
[444653.624098] aiy-vision spi0.0: Sending initial header
[444653.624107] aiy-vision spi0.0: Waiting before write.
[444653.624118] aiy-vision spi0.0: Done waiting, writing.
[444653.624187] aiy-vision spi0.0: Spi write complete.
[444653.624197] aiy-vision spi0.0: Waiting before read.
[444653.624254] aiy-vision spi0.0: Done waiting, reading.
[444653.624294] aiy-vision spi0.0: Recieved header: 07 size 0 crc c651 tid 0
[444653.624304] aiy-vision spi0.0: header_exchange succesful
[444653.624312] aiy-vision spi0.0: Scheduling poll
[444653.624329] aiy-vision spi0.0: Pending tid(s) =
[444653.624337] aiy-vision spi0.0: 1
[444653.624343] aiy-vision spi0.0:
[444653.654076] aiy-vision spi0.0: Polling for completed transaction
[444653.654095] aiy-vision spi0.0: Sending initial header
[444653.654106] aiy-vision spi0.0: Waiting before write.
[444653.654116] aiy-vision spi0.0: Done waiting, writing.
[444653.654186] aiy-vision spi0.0: Spi write complete.
[444653.654195] aiy-vision spi0.0: Waiting before read.
[444653.654221] aiy-vision spi0.0: Done waiting, reading.
[444653.654260] aiy-vision spi0.0: Recieved header: 2f size 134 crc aeeb tid 1
[444653.654270] aiy-vision spi0.0: header_exchange succesful
[444653.654278] aiy-vision spi0.0: tid 1 complete
[444653.654287] aiy-vision spi0.0: Slave has a response for tid 1, reading.
[444653.654298] aiy-vision spi0.0: receive_data_buffer of size 134, buffer_len 12582928
[444653.654305] aiy-vision spi0.0: Waiting before read.
[444654.724203] aiy-vision spi0.0: Slave not ready after 1000 ms
[444654.724236] aiy-vision spi0.0: Fatal error, resetting
[444654.724245] aiy-vision spi0.0: Resetting myriad
[444654.725872] aiy-vision spi0.0: visionbonnet_ioctl cmd=0xc0108903, arg=2921414664
[444656.883997] aiy-vision spi0.0: Requesting firmware myriad_fw.mvcmd
[444657.203798] aiy-vision spi0.0: Writing myriad firmware
[444657.639558] systemd-journald[20995]: Received SIGTERM from PID 1 (systemd).
[444657.651508] systemd-journal: 4485 output lines suppressed due to ratelimiting
[444657.659838] systemd[1]: Failed to start Journal Service.
[444657.665159] systemd[1]: systemd-journald.service: Unit entered failed state.
[444657.665428] systemd[1]: systemd-journald.service: Failed with result 'timeout'.
[444657.715728] systemd[1]: systemd-journald.service: Service has no hold-off time, scheduling restart.
[444657.722637] systemd[1]: Stopped Journal Service.
[444657.766262] systemd[1]: Starting Journal Service...
[444657.992074] systemd[1]: Started Journal Service.
[444662.831812] aiy-vision spi0.0: Myriad booting
[444663.012999] aiy-vision spi0.0: Myriad ready
[444663.014536] aiy-vision spi0.0: Freeing tid 1
[444663.021802] aiy-vision spi0.0: Assigning tid 1
[444663.026886] aiy-vision spi0.0: processing tid 1
[444663.026905] aiy-vision spi0.0: Sending initial header
[444663.026913] aiy-vision spi0.0: Waiting before write.
[444663.026924] aiy-vision spi0.0: Done waiting, writing.
[444663.026991] aiy-vision spi0.0: Spi write complete.
[444663.026999] aiy-vision spi0.0: Waiting before read.
[444663.027071] aiy-vision spi0.0: Done waiting, reading.
[444663.027116] aiy-vision spi0.0: Recieved header: 07 size 2 crc 8168 tid 1
[444663.027126] aiy-vision spi0.0: header_exchange succesful
[444663.027135] aiy-vision spi0.0: send_data_buffer of size 2
[444663.027142] aiy-vision spi0.0: Waiting before write.
[444663.027159] aiy-vision spi0.0: Done waiting, writing.
[444663.027181] aiy-vision spi0.0: Spi write complete.
[444663.027195] aiy-vision spi0.0: Data sent, sending crc
[444663.027203] aiy-vision spi0.0: Waiting before write.
[444663.027209] aiy-vision spi0.0: Done waiting, writing.
[444663.027230] aiy-vision spi0.0: Spi write complete.
[444663.027237] aiy-vision spi0.0: Reading crc packet
[444663.027243] aiy-vision spi0.0: Waiting before read.
[444663.027283] aiy-vision spi0.0: Done waiting, reading.
[444663.027333] aiy-vision spi0.0: send_data_buffer succesful
[444663.027348] aiy-vision spi0.0: Data sent. tid 1 complete 0 is_supported 1 has_data 0 size 0
[444663.027365] aiy-vision spi0.0: Slave has no response, deferring tid 1 to ongoing queue
[444663.027373] aiy-vision spi0.0: Pending tid(s) =
[444663.027379] aiy-vision spi0.0: 1
[444663.027384] aiy-vision spi0.0:
[444663.027428] aiy-vision spi0.0: Polling for completed transaction
[444663.027436] aiy-vision spi0.0: Sending initial header
[444663.027443] aiy-vision spi0.0: Waiting before write.
[444663.027450] aiy-vision spi0.0: Done waiting, writing.
[444663.027508] aiy-vision spi0.0: Spi write complete.
[444663.027517] aiy-vision spi0.0: Waiting before read.
[444663.027524] aiy-vision spi0.0: Done waiting, reading.
[444663.027576] aiy-vision spi0.0: Recieved header: 2f size 59 crc 66bc tid 1[444663.027586] aiy-vision spi0.0: header_exchange succesful
[444663.027594] aiy-vision spi0.0: tid 1 complete
[444663.027602] aiy-vision spi0.0: Slave has a response for tid 1, reading.
[444663.027612] aiy-vision spi0.0: receive_data_buffer of size 59, buffer_len 12582928
[444663.027619] aiy-vision spi0.0: Waiting before read.
[444663.027635] aiy-vision spi0.0: Done waiting, reading.
[444663.027751] aiy-vision spi0.0: Waiting before read.
[444663.028003] aiy-vision spi0.0: Done waiting, reading.
[444663.028077] aiy-vision spi0.0: receive_data_buffer succesful
[444663.028117] aiy-vision spi0.0: Freeing tid 1
HeatfanJohn commented 6 years ago

I don't know if this is a related or new issue. Kernel null pointer dereference oops error.

joy_ detection_demo service is hung and no additional debug output is being generated.

[774424.013989] aiy-vision spi0.0: Polling for completed transaction
[774424.013997] aiy-vision spi0.0: Sending initial header
[774424.014003] aiy-vision spi0.0: Waiting before write.
[774424.014011] aiy-vision spi0.0: Done waiting, writing.
[774424.014039] aiy-vision spi0.0: Spi write complete.
[774424.014056] aiy-vision spi0.0: Waiting before read.
[774424.014063] aiy-vision spi0.0: Done waiting, reading.
[774424.014092] aiy-vision spi0.0: Recieved header: 07 size 0 crc c651 tid 0
[774424.014101] aiy-vision spi0.0: header_exchange succesful
[774424.014109] aiy-vision spi0.0: Scheduling poll
[774424.014126] aiy-vision spi0.0: Pending tid(s) =
[774424.014134] aiy-vision spi0.0: 1
[774424.014140] aiy-vision spi0.0:
[774424.037412] aiy-vision spi0.0: Polling for completed transaction
[774424.037433] aiy-vision spi0.0: Sending initial header
[774424.037442] aiy-vision spi0.0: Waiting before write.
[774424.037453] aiy-vision spi0.0: Done waiting, writing.
[774424.037522] aiy-vision spi0.0: Spi write complete.
[774424.037532] aiy-vision spi0.0: Waiting before read.
[774424.037600] aiy-vision spi0.0: Done waiting, reading.
[774424.037638] aiy-vision spi0.0: Recieved header: 07 size 0 crc c651 tid 0
[774424.037648] aiy-vision spi0.0: header_exchange succesful
[774424.037657] aiy-vision spi0.0: Scheduling poll
[774424.037671] aiy-vision spi0.0: Pending tid(s) =
[774424.037678] aiy-vision spi0.0: 1
[774424.037684] aiy-vision spi0.0:
[774424.067428] aiy-vision spi0.0: Polling for completed transaction
[774424.067447] aiy-vision spi0.0: Sending initial header
[774424.067458] aiy-vision spi0.0: Waiting before write.
[774424.067469] aiy-vision spi0.0: Done waiting, writing.
[774424.067541] aiy-vision spi0.0: Spi write complete.
[774424.067550] aiy-vision spi0.0: Waiting before read.
[774424.067577] aiy-vision spi0.0: Done waiting, reading.
[774424.067619] aiy-vision spi0.0: Recieved header: 2f size 130 crc 641a tid 1
[774424.067631] aiy-vision spi0.0: header_exchange succesful
[774424.067639] aiy-vision spi0.0: tid 1 complete
[774424.067649] aiy-vision spi0.0: Slave has a response for tid 1, reading.
[774424.067659] aiy-vision spi0.0: receive_data_buffer of size 130, buffer_len 12582928
[774424.067666] aiy-vision spi0.0: Waiting before read.
[774424.067686] aiy-vision spi0.0: Done waiting, reading.
[774424.067840] aiy-vision spi0.0: Waiting before read.
[774424.068817] aiy-vision spi0.0: Done waiting, reading.
[774424.068943] aiy-vision spi0.0: receive_data_buffer succesful
[774424.069073] aiy-vision spi0.0: Freeing tid 1
[774424.071869] Unable to handle kernel NULL pointer dereference at virtual address 00000070
[774424.071908] pgd = c58cc000
[774424.071919] [00000070] *pgd=119ed831, *pte=00000000, *ppte=00000000
[774424.071952] Internal error: Oops: 17 [#1] ARM
[774424.071964] Modules linked in: aiy_vision(O) fuse joydev evdev aiy_adc(O) gpio_aiy_io(O) industrialio pwm_aiy_io(O) uinput leds_ktd202x(O) aiy_io_i2c(O) spidev cmac rfcomm bnep hci_uart btbcm serdev bluetooth ecdh_generic usb_f_rndis u_ether usb_f_acm u_serial brcmfmac brcmutil cfg80211 snd_soc_bcm2835_i2s regmap_mmio rfkill snd_soc_core i2c_bcm2835 snd_compress snd_pcm_dmaengine snd_pcm snd_timer snd spi_bcm2835 uio_pdrv_genirq uio fixed pwm_soft(O) i2c_dev libcomposite dwc2 udc_core ip_tables x_tables ipv6 [last unloaded: aiy_vision]
[774424.072156] CPU: 0 PID: 21259 Comm: python3 Tainted: G           O    4.14.34+ #1110
[774424.072173] Hardware name: BCM2835
[774424.072184] task: d4918de0 task.stack: c0c0a000
[774424.072225] PC is at free_pcppages_bulk+0x274/0x498
[774424.072239] LR is at 0xd7f031dc
[774424.072248] pc : [<c00fb068>]    lr : [<d7f031dc>]    psr: 60000093
[774424.072257] sp : c0c0bdb8  ip : d7f031c8  fp : c0c0be04
[774424.072265] r10: 00000000  r9 : d7f031c8  r8 : 00000001
[774424.072274] r7 : ffffffff  r6 : 00011962  r5 : c09be368  r4 : c09be368
[774424.072283] r3 : 00000034  r2 : c09be368  r1 : 00000000  r0 : d7f08ed0
[774424.072295] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
[774424.072306] Control: 00c5387d  Table: 058cc008  DAC: 00000055
[774424.072317] Process python3 (pid: 21259, stack limit = 0xc0c0a188)
[774424.072328] Stack: (0xc0c0bdb8 to 0xc0c0c000)
[774424.072340] bda0:                                                       00000000 00000001
[774424.072356] bdc0: d7f031c8 c09be368 d7c72af8 d7c72aec 00000012 00000012 d8830000 60000013
[774424.072371] bde0: d7c72aec 0000001f c09be368 00000000 00000000 d7f10818 c0c0be44 c0c0be08
[774424.072385] be00: c00fd0ac c00fae00 c0c0be2c c0c0be18 c0136388 c09be368 00000002 00000000
[774424.072400] be20: d10c8fc0 db501000 00000001 00000018 d48a9780 d48a979c c0c0be5c c0c0be48
[774424.072415] be40: c00fd13c c00fce94 d8830000 00000a40 c0c0be7c c0c0be60 c0137bdc c00fd0f8
[774424.072430] be60: d48a9780 00000000 000001ef ae214018 c0c0be8c c0c0be80 c0137cfc c0137b84
[774424.072445] be80: c0c0bea4 c0c0be90 bf06cba8 c0137cb4 d48e1400 00000000 c0c0bf0c c0c0bea8
[774424.072460] bea0: bf06e4cc bf06cb38 00000001 d48e1470 d48e1480 ae214008 c00816fc 00000018
[774424.072475] bec0: 00001388 00c00010 00000082 00000000 d4918de0 c00531d4 c0c0bed8 c0c0bed8
[774424.072490] bee0: c0046aec ae214008 d6af2d60 d6b0fa00 00000005 00000005 c0c0a000 00000000
[774424.072505] bf00: c0c0bf7c c0c0bf10 c016c858 bf06dfdc c006fc2c c04ec93c 8cfba25d 00035388
[774424.072520] bf20: c0c0bf44 c0c0bf88 bef6105c 00000009 00000189 c01774c4 d1b6b300 00c00010
[774424.072535] bf40: ae214008 bef60bd4 c0108903 00000005 c0c0bf6c d6b0fa01 ae214008 d6b0fa00
[774424.072551] bf60: c0108903 00000005 c0c0a000 00000000 c0c0bfa4 c0c0bf80 c016cf54 c016c7c8
[774424.072566] bf80: bef60bd4 00c00010 00378000 bef60bd4 00000036 c000ff64 00000000 c0c0bfa8
[774424.072582] bfa0: c000fdc0 c016cf1c 00c00010 00378000 00000005 c0108903 ae214008 bef60bd4
[774424.072597] bfc0: 00c00010 00378000 bef60bd4 00000036 00000005 c0108903 ae214008 bef60bd8
[774424.072613] bfe0: 003782fc bef60bc4 000362d4 b6da980c 80000010 00000005 2d4badad 788a2a3e
[774424.072654] [<c00fb068>] (free_pcppages_bulk) from [<c00fd0ac>] (free_hot_cold_page+0x224/0x264)
[774424.072674] [<c00fd0ac>] (free_hot_cold_page) from [<c00fd13c>] (__free_pages+0x50/0x54)
[774424.072701] [<c00fd13c>] (__free_pages) from [<c0137bdc>] (__vunmap+0x64/0xe0)
[774424.072720] [<c0137bdc>] (__vunmap) from [<c0137cfc>] (vfree+0x54/0x94)
[774424.072768] [<c0137cfc>] (vfree) from [<bf06cba8>] (transaction_unref+0x7c/0xa4 [aiy_vision])
[774424.072833] [<bf06cba8>] (transaction_unref [aiy_vision]) from [<bf06e4cc>] (visionbonnet_ioctl+0x4fc/0x6a8 [aiy_vision])
[774424.072864] [<bf06e4cc>] (visionbonnet_ioctl [aiy_vision]) from [<c016c858>] (do_vfs_ioctl+0x9c/0x754)
[774424.072883] [<c016c858>] (do_vfs_ioctl) from [<c016cf54>] (SyS_ioctl+0x44/0x6c)
[774424.072913] [<c016cf54>] (SyS_ioctl) from [<c000fdc0>] (ret_fast_syscall+0x0/0x28)
[774424.072933] Code: e58c0014 e7842003 e3a03034 e51b2040 (e0212193)[774424.072952] ---[ end trace ee6907230b405e54 ]---
[774427.075364] systemd[1]: Started Journal Service.
HeatfanJohn commented 5 years ago

Hi @dmitriykovalev, does debug information posted on 6/21 help to identify the root cause of the timeout?

This issue still happens about once every couple of days on my kit. Could this be just an issue with my kit?

Is there a utility I could run that stresses whatever is timing out? In other words a test that just continuously performs the operation that is timing out.

Does aiy-vision spi0.0: receive_data_buffer of size 134, buffer_len 12582928 mean that the kernel driver has received some indication that the Myriad chip wants to sent the driver some data?

And then what's the purpose of Waiting before read.?

The driver then never says Done waiting, reading - the next entry is Slave not ready after 1000ms.

What could cause that to happen?

Thanks again, John

shadetree01010100 commented 5 years ago

It's not just your kit @HeatfanJohn . I've been consistently having the same issue running the joy detector model. dmesg output:

...
[   31.989712] aiy-vision spi0.0: Myriad booting
[   32.170930] aiy-vision spi0.0: Myriad ready
[   54.982455] fuse init (API version 7.26)
[ 1051.414166] aiy-vision spi0.0: Incoming crc mismatch: slave e2236599 vs computed 7b2a3423
[ 1672.773033] aiy-vision spi0.0: Incoming crc mismatch: slave 655537b9 vs computed 153fc336
[ 1672.773989] aiy-vision spi0.0: Incoming crc mismatch: slave 655537b9 vs computed 153fc336
[ 1976.250550] aiy-vision spi0.0: Incoming crc mismatch: slave 25059164 vs computed a80d9c91
[ 2512.102541] aiy-vision spi0.0: Incoming crc mismatch: slave 03d2b2e3 vs computed 0d093ad1
[ 3104.280435] aiy-vision spi0.0: Incoming crc mismatch: slave c6d68090 vs computed b1d1b006
[ 3231.062263] aiy-vision spi0.0: Incoming crc mismatch: slave 859c96d8 vs computed f29ba64e
[ 3616.357552] aiy-vision spi0.0: Incoming crc mismatch: slave 031846e4 vs computed 9d7cd347
[ 3795.223199] aiy-vision spi0.0: Incoming crc mismatch: slave d6e0fed7 vs computed a851826a
[ 4156.109609] aiy-vision spi0.0: Incoming crc mismatch: slave 8df46b3e vs computed 6a4c4639
[ 4815.301357] aiy-vision spi0.0: Incoming crc mismatch: slave ee9f78c1 vs computed 842f5833
[ 4940.422108] aiy-vision spi0.0: Incoming crc mismatch: slave 2c2d0453 vs computed 469d24a1
[ 5870.617523] aiy-vision spi0.0: Incoming crc mismatch: slave d1bbcdef vs computed bc662904
[ 6411.796483] aiy-vision spi0.0: Incoming crc mismatch: slave 0dc7683e vs computed 8e14edf9
[ 6831.938741] aiy-vision spi0.0: Incoming crc mismatch: slave 53ef8764 vs computed 40831f32
[ 7330.721832] aiy-vision spi0.0: Incoming crc mismatch: slave 246844a6 vs computed d90abddc
[ 7649.464274] aiy-vision spi0.0: Incoming crc mismatch: slave b329af4f vs computed ef9ac54b
[ 7669.323520] aiy-vision spi0.0: Slave not ready after 1000 ms
[ 7669.323546] aiy-vision spi0.0: Failed to write header: -85
[ 7669.323558] aiy-vision spi0.0: Fatal error, resetting
[ 7669.323588] aiy-vision spi0.0: Resetting myriad
[ 7671.821515] aiy-vision spi0.0: Writing myriad firmware
[ 7676.756919] aiy-vision spi0.0: Myriad booting
[ 7676.938108] aiy-vision spi0.0: Myriad ready

The same OSError: [Errno 14] Bad address is raised in my application. Based on some extra logging I've been doing, it SEEMS like this starts to happen once the Pi's CPU core temp is above 65­C. I realize of course that that is a different chip from the Myriad, BUT it's in the same cardboard box as the Myriad, and it's amazingly hot inside there so I'm wondering if that's an indicator of a thermal issue in the kit.

Is it possible that the Vision Bonnet is overheating and resetting?

HeatfanJohn commented 5 years ago

@tyoungNIO Please don't laugh at my Elephant fan, but using this fan lowered the temperature of my AIYVision from ~60 degrees C to 39 to 40 degrees C and the problem still occurred.

20190121_145032-resize

shadetree01010100 commented 5 years ago

That's what I think I found, as well, it's been a while since I looked at it. I added some vent holes but never got around to a fan. I commend the resourcefulness!

HeatfanJohn commented 5 years ago

@dmitriykovalev Is there any updated software/firmware that I can try to see if that resolves this issue which still persists.

dmitriykovalev commented 5 years ago

@HeatfanJohn Have you tried to run echo "over_voltage=4" | sudo tee -a /boot/config.txt according to https://aiyprojects.withgoogle.com/help/#using-the-vision-kit ?

rnehrboss commented 5 years ago

We have the same CRC error once a day or so on some specific devices. Seems like some devices don't have the issue, although not enough time has passed to say that with any confidence.

Has anyone figured out a graceful recovery for this... or the cause?

Thanks

shadetree01010100 commented 5 years ago

@rnehrboss You can try catching the OSError and restarting your loop when raised. In my experience the hardware is usable again after it restarts itself, so you can re-initialize CameraInference and begin looping over run() again.

How you go about catching that exception is going to depend on your application and how you're iterating over run(), for example in another thread.

HeatfanJohn commented 4 years ago

I don't know if this is too early to declare victory on this problem, but I recently ran apt-get update/upgrade on my AIYvision and that upgraded the Linux kernel to 4.19.57+

That caused me to have problem #627 which is corrected by performing the steps in this comment.

Since applying the #627 fix my vision and joy_detection_demo has been running flawlessly for over 47 hours. I don't recall ever running that long without an error and with this demo continuing to work.

I'm keeping my fingers crossed. Perhaps an update to the driver or Myriad bonnet firmware corrected the problem.

Here's what I'm running now:

pi@aiyvision:~ $ dpkg -l | grep aiy
ii  aiy-bt-prov-server                    1.0-1        all          AIY Bluetooth provisioning server
ii  aiy-core                              0.1          all          Core packages for AIY Projects Kits
ii  aiy-dkms                              1.1-2        all          Hardware device drivers for AIY Projects Kits.
ii  aiy-full                              1.0-0        all          Full package set for AIY Projects Kits
ii  aiy-gadget                            1.0-0        all          AIY USB Gadget
ii  aiy-io-mcu-firmware                   1.0-1        all          AIY MCU Firmware
ii  aiy-misc                              1.0-0        all          Miscellaneous packages for AIY Projects Kits
ii  aiy-models                            1.1-0        all          Binary model data for AIY devices
ii  aiy-projects-python                   1.1-0        all          AIY Python API
ii  aiy-python-wheels                     1.1-1        all          Prebuilt Python wheels for AIY
ii  aiy-vision-dkms                       1.1-1        all          AIY Vision DKMS driver.
ii  aiy-vision-firmware                   1.2-0        all          Binary firmware for AIY vision bonnet
ii  aiy-voice-services                    1.1-1        all          AIY Voice Services
ii  aiy-voicebonnet-routes                1.0-0        all          Routing for AIY voicebonnet
ii  aiy-voicebonnet-soundcard-dkms        1.1          armhf        aiy-voicebonnet-soundcard driver in DKMS format.

pi@aiyvision:~ $ uname -a
Linux aiyvision 4.19.57+ #1244 Thu Jul 4 18:42:50 BST 2019 armv6l GNU/Linux
pi@aiyvision:~ $
HeatfanJohn commented 4 years ago

Nope ... problem still exists ... joy_demo ran for over 48 hours however ...

Aug  2 09:29:35 aiyvision kernel: [320308.839713] aiy-vision spi0.0: Slave not ready after 1000 ms
Aug  2 09:29:35 aiyvision kernel: [320308.839751] aiy-vision spi0.0: Fatal error, resetting
Aug  2 09:29:36 aiyvision python3[296]: INFO:aiy.vision.inference:Stop camera inference.
Aug  2 09:29:36 aiyvision kernel: [320309.285271] aiy-vision spi0.0: Resetting myriad
Aug  2 09:29:38 aiyvision kernel: [320311.399671] aiy-vision spi0.0: Writing myriad firmware
Aug  2 09:29:43 aiyvision kernel: [320316.711733] aiy-vision spi0.0: Myriad booting
Aug  2 09:29:43 aiyvision kernel: [320316.893989] aiy-vision spi0.0: Myriad ready
Aug  2 09:29:43 aiyvision python3[296]: INFO:aiy.vision.inference:Unload model "FaceDetection".
Aug  2 09:29:43 aiyvision kernel: [320317.003211] aiy-vision spi0.0: Device released: 0x86a5bb6c
Aug  2 09:29:44 aiyvision python3[296]: ERROR:__main__:Exception while running joy demo.
Aug  2 09:29:44 aiyvision python3[296]: Traceback (most recent call last):
Aug  2 09:29:44 aiyvision python3[296]:   File "/usr/lib/python3.5/contextlib.py", line 345, in __exit__
Aug  2 09:29:44 aiyvision python3[296]:     if cb(*exc_details):
Aug  2 09:29:44 aiyvision python3[296]:   File "/usr/lib/python3.5/contextlib.py", line 291, in _exit_wrapper
Aug  2 09:29:44 aiyvision python3[296]:     callback(*args, **kwds)
Aug  2 09:29:44 aiyvision python3[296]:   File "/opt/aiy/projects-python/src/aiy/vision/inference.py", line 109, in <lambda>
Aug  2 09:29:44 aiyvision python3[296]:     self._stack.callback(lambda: self._engine.stop_camera_inference())
Aug  2 09:29:44 aiyvision python3[296]:   File "/opt/aiy/projects-python/src/aiy/vision/inference.py", line 363, in stop_camera_inference
Aug  2 09:29:44 aiyvision python3[296]:     self._communicate_bytes(_REQ_STOP_CAMERA_INFERENCE)
Aug  2 09:29:44 aiyvision python3[296]:   File "/opt/aiy/projects-python/src/aiy/vision/inference.py", line 293, in _communicate_bytes
Aug  2 09:29:44 aiyvision python3[296]:     raise InferenceException(response.status.message)
Aug  2 09:29:44 aiyvision python3[296]: aiy.vision.inference.InferenceException: StopCameraInference: Camera inference is not running.
Aug  2 09:29:44 aiyvision python3[296]: During handling of the above exception, another exception occurred:
Aug  2 09:29:44 aiyvision python3[296]: Traceback (most recent call last):
Aug  2 09:29:44 aiyvision python3[296]:   File "/home/pi/AIY-projects-python/src/examples/vision/joy/joy_detection_demo.py", line 317, in joy_detector
Aug  2 09:29:44 aiyvision python3[296]:     for faces, frame_size in run_inference(num_frames, model_loaded):
Aug  2 09:29:44 aiyvision python3[296]:   File "/home/pi/AIY-projects-python/src/examples/vision/joy/joy_detection_demo.py", line 73, in run_inference
Aug  2 09:29:44 aiyvision python3[296]:     yield face_detection.get_faces(result), (result.width, result.height)
Aug  2 09:29:44 aiyvision python3[296]:   File "/opt/aiy/projects-python/src/aiy/vision/inference.py", line 143, in __exit__
Aug  2 09:29:44 aiyvision python3[296]:     self.close()
Aug  2 09:29:44 aiyvision python3[296]:   File "/opt/aiy/projects-python/src/aiy/vision/inference.py", line 137, in close
Aug  2 09:29:44 aiyvision python3[296]:     self._stack.close()
Aug  2 09:29:44 aiyvision python3[296]:   File "/usr/lib/python3.5/contextlib.py", line 313, in close
Aug  2 09:29:44 aiyvision python3[296]:     self.__exit__(None, None, None)
Aug  2 09:29:44 aiyvision python3[296]:   File "/usr/lib/python3.5/contextlib.py", line 360, in __exit__
Aug  2 09:29:44 aiyvision python3[296]:     raise exc_details[1]
Aug  2 09:29:44 aiyvision python3[296]:   File "/usr/lib/python3.5/contextlib.py", line 345, in __exit__
Aug  2 09:29:44 aiyvision python3[296]:     if cb(*exc_details):
Aug  2 09:29:44 aiyvision python3[296]:   File "/usr/lib/python3.5/contextlib.py", line 291, in _exit_wrapper
Aug  2 09:29:44 aiyvision python3[296]:     callback(*args, **kwds)
Aug  2 09:29:44 aiyvision python3[296]:   File "/opt/aiy/projects-python/src/aiy/vision/inference.py", line 106, in <lambda>
Aug  2 09:29:44 aiyvision python3[296]:     self._stack.callback(lambda: self._engine.unload_model(model_name))
Aug  2 09:29:44 aiyvision python3[296]:   File "/opt/aiy/projects-python/src/aiy/vision/inference.py", line 343, in unload_model
Aug  2 09:29:44 aiyvision python3[296]:     unload_model=pb2.Request.UnloadModel(model_name=model_name)))
Aug  2 09:29:44 aiyvision python3[296]:   File "/opt/aiy/projects-python/src/aiy/vision/inference.py", line 287, in _communicate
Aug  2 09:29:44 aiyvision python3[296]:     return self._communicate_bytes(request.SerializeToString(), timeout=timeout)
Aug  2 09:29:44 aiyvision python3[296]:   File "/opt/aiy/projects-python/src/aiy/vision/inference.py", line 293, in _communicate_bytes
Aug  2 09:29:44 aiyvision python3[296]:     raise InferenceException(response.status.message)
Aug  2 09:29:44 aiyvision python3[296]: aiy.vision.inference.InferenceException: UnloadModel: Model 'FaceDetection' is not loaded.
Aug  2 09:29:44 aiyvision python3[296]: During handling of the above exception, another exception occurred:
Aug  2 09:29:44 aiyvision python3[296]: Traceback (most recent call last):
Aug  2 09:29:44 aiyvision python3[296]:   File "/home/pi/AIY-projects-python/src/examples/vision/joy/joy_detection_demo.py", line 367, in main
Aug  2 09:29:44 aiyvision python3[296]:     args.enable_streaming, args.streaming_bitrate, args.mdns_name)
Aug  2 09:29:44 aiyvision python3[296]:   File "/home/pi/AIY-projects-python/src/examples/vision/joy/joy_detection_demo.py", line 333, in joy_detector
Aug  2 09:29:44 aiyvision python3[296]:     break
Aug  2 09:29:44 aiyvision python3[296]:   File "/usr/lib/python3.5/contextlib.py", line 360, in __exit__
Aug  2 09:29:44 aiyvision python3[296]:     raise exc_details[1]
Aug  2 09:29:44 aiyvision python3[296]:   File "/usr/lib/python3.5/contextlib.py", line 345, in __exit__
Aug  2 09:29:44 aiyvision python3[296]:     if cb(*exc_details):
Aug  2 09:29:44 aiyvision python3[296]:   File "/usr/lib/python3.5/contextlib.py", line 261, in _exit_wrapper
Aug  2 09:29:44 aiyvision python3[296]:     return cm_exit(cm, *exc_details)
Aug  2 09:29:44 aiyvision python3[296]:   File "/usr/lib/python3/dist-packages/picamera/camera.py", line 758, in __exit__
Aug  2 09:29:44 aiyvision python3[296]:     self.close()
Aug  2 09:29:44 aiyvision python3[296]:   File "/usr/lib/python3/dist-packages/picamera/camera.py", line 752, in close
Aug  2 09:29:44 aiyvision python3[296]:     raise exc
Aug  2 09:29:44 aiyvision python3[296]:   File "/usr/lib/python3/dist-packages/picamera/camera.py", line 1995, in _control_callback
Aug  2 09:29:44 aiyvision python3[296]:     "No data recevied from sensor. Check all connections, "
Aug  2 09:29:44 aiyvision python3[296]: picamera.exc.PiCameraRuntimeError: No data recevied from sensor. Check all connections, including the SUNNY chip on the camera board