raspberrypi / firmware

This repository contains pre-compiled binaries of the current Raspberry Pi kernel and modules, userspace libraries, and bootloader/GPU firmware.
5.15k stars 1.68k forks source link

Firmware transaction timeout - raspistill hang #1552

Closed do-dat-thang closed 3 years ago

do-dat-thang commented 3 years ago

Is this the right place for my bug report? I'm pretty sure.

Describe the bug raspistill hangs on Firmware transaction timeout. After that point Xorg also hangs up. On this latest recreate, raspinfo is also hung when running after the problem to collect data.

To reproduce Run raspistill repeatedly until the error occurs.

Expected behaviour The kernel & firmware should remain stable at all times. Running a camera capture should always create an image and return or gracefully fail leaving the system stable.

Actual behaviour Running raspistill repeatedly will eventually get the firmware transaction timeout.

System

raspinfo is hung after displaying the following:

$ raspinfo

System Information

Raspberry Pi 4 Model B Rev 1.2 PRETTY_NAME="Raspbian GNU/Linux 10 (buster)" NAME="Raspbian GNU/Linux" VERSION_ID="10" VERSION="10 (buster)"

Raspberry Pi reference 2021-01-11 Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 21090519d85bdaa1615d5d5057d37b09368ea5d2, stage5

Linux projector1 5.4.83-v7l+ #1379 SMP Mon Dec 14 13:11:54 GMT 2020 armv7l GNU/Linux Revision : c03112 Serial : 10000000e4a3851c Model : Raspberry Pi 4 Model B Rev 1.2

I will continue to try to get the raspinfo or, failing that, will try to get the equivalent parts.

Pi4B, Pi3B+

Raspberry Pi reference 2021-01-11 Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 21090519d85bdaa1615d5d5057d37b09368ea5d2, stage5

Linux projector1 5.4.83-v7l+ #1379 SMP Mon Dec 14 13:11:54 GMT 2020 armv7l GNU/Linux

and all subsequent levels.

Logs

dmesg.txt

Additional context This recreate is the first time raspinfo hung. I displaying to 2 HDMI monitors. There are 2 USB to serial adapter cables connected but not in use during recreate. This problem occurs in all levels following and including the Dec 14, 2020 commit. I am running a C++ application which snaps 2 images every few seconds using these 2 commands for the 2 invocations (via system()) of raspistill:

raspistill -e jpg -w 2592 -h 1944 -t 1 -n -rot 180 -st -o /home/pi/tgt.jpg 2>&1 raspistill -e jpg -w 2592 -h 1944 -t 1 -n -rot 180 -ag 8.0 -dg 0.0 -st -o /home/pi/img.jpg 2>&1

When this problem occurs I have to either power cycle or run the following to reboot (sudo reboot seems to start rebooting, but never actually reboots):

echo s | sudo tee /proc/sysrq-trigger echo u | sudo tee /proc/sysrq-trigger echo b | sudo tee /proc/sysrq-trigger

No success with raspinfo - I'll recreate & see if I can collect & post the additional info.

do-dat-thang commented 3 years ago

IDK why raspinfo hangs now (perhaps I never tried raspinfo after hitting this on 5.4.83 ??). Attaching raspinfo from 5.1.17 from previous recreation at that level.

raspinfo.txt

do-dat-thang commented 3 years ago

On 5.4.83

vcdbg log msg

vcdbg_log_msg.txt

feacluster commented 3 years ago

Confirming I have the same issue using 5.4.83-v8+ and 5.10.17-v7l+ but no issue with 4.19.97-v7l+ .

6by9 commented 3 years ago

Please try upgrading just the firmware and not the kernel. sudo SKIP_KERNEL=1 rpi-update

Almost all of the camera stack is on the firmware side, so I'm expecting the issue to be there rather than in the kernel.

If you have a reliable way of reproducing this, then you can use sudo SKIP_KERNEL=1 rpi-update <hash> with commit hashes https://github.com/Hexxeh/rpi-firmware/commits/master to try and isolate when the issue first appeared.

do-dat-thang commented 3 years ago

Thanks, @6x9, for the tip. I will put both kernel and firmware from 5.4.81 (problem not seen) on my rpi4 & then will do the above to update just firmware to 5.4.83 stepwise using your link, above. Is there any helpful debug information I can collect?

do-dat-thang commented 3 years ago

Ok, did rpi-update c91312ae1df5011bd77826e49d2b914b6d0e9b08 to get back to 5.4.81. Then sudo SKIP_KERNEL=1 rpi-update fa9d0f5299d132240c94fe6e7065b9c53897d725 to get the 1st firmare update.

It appears to be running ok (no FW hang yet), but I ran raspinfo just now and was surprised to see some of info in the vcdbg log messages. I suspect the EDID error is because I didn't have my displays turned on when I started but some the stuff at the top I wouldn't expect(?). Attaching:

raspinfo.nohang-5_4_81-fa9d0f5299d132240c94fe6e7065b9c53897d725.txt

do-dat-thang commented 3 years ago

Ran through these in order, letting them run for 1/2-1 hr:

sudo SKIP_KERNEL=1 rpi-update fa9d0f5299d132240c94fe6e7065b9c53897d725 sudo SKIP_KERNEL=1 rpi-update 942756073c6970afc17afb6a34ff217d9831e09f sudo SKIP_KERNEL=1 rpi-update dea7234943c604462e476a8afc13c587418e8709 sudo SKIP_KERNEL=1 rpi-update 453e49bdd87325369b462b40e809d5f3187df21d

The last one (453e49bdd87325369b462b40e809d5f3187df21d) failed in 11 minutes.

Tried raspinfo but hangs after

$ raspinfo
System Information
------------------

Raspberry Pi 4 Model B Rev 1.2
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"

Raspberry Pi reference 2021-01-11
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 21090519d85bdaa1615d5d5057d37b09368ea5d2, stage5

Linux projector1 5.4.81-v7l+ #1378 SMP Mon Dec 7 18:43:09 GMT 2020 armv7l GNU/Linux
Revision        : c03112
Serial          : 10000000e4a3851c
Model           : Raspberry Pi 4 Model B Rev 1.2

dmesg_453e49bdd87325369b462b40e809d5f3187df21d.txt

6by9 commented 3 years ago

Linking to the forum thread https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=304989

Thank you for testing. I'm reading from that that you're using the HQ camera, even though your raspistill line is for a 5MPix capture.

453e49bdd87325369b462b40e809d5f3187df21d does have an update to the IMX477 HQ camera driver, but that should only be for the high framerate mode to avoid image quality issues. It shouldn't affect stills captures at all, but at least it's something to investigate.

do-dat-thang commented 3 years ago

Interesting - I thought I have the V2 camera. Isn't that the 5MPix device? The PCB has Raspberry Pi Camera (B) Rev 2.0 silk-screened on it. Could the whole problem we're seeing be an issue with HW detection/recognition incorrectly identifying it?

RpiCam

6by9 commented 3 years ago

That is not a camera module made by Raspberry Pi. It is a third party clone of the v1 module (Omnivision OV5647 5MPix).

The assert logs that had been provided on the forum thread imply that the camera module isn't shutting down cleanly, and is locking up the camera subsystem (and possibly other parts) on the GPU. Kernel logs are largely useless on this issue. sudo vcdbg log msg, and preferably to be running with start_debug=1```` in config.txt and thereforesudo vcdbg log assert``` will tell us where the firmware has hit issues.

do-dat-thang commented 3 years ago

so, put start_debug=1 into config.txt (assume at end of file in [all] section) and then reboot & run the vcdbg commands after recreate?

I will try that, but I believe the vcdbg commands hang...

do-dat-thang commented 3 years ago

Nice! This time the problem hit really fast. And the vcdbg seems to have worked:

vcdbg_log_msg.txt vcdbg_log_assert.txt

do-dat-thang commented 3 years ago

I don't see any of the ../../../../../middleware/ISP/tuner/*.c files referenced in the vcdbg log assert in the Raspberry Pi repo. Are these available elsewhere or are they not open?

Are the "dump_stack failed" expected? Or are these also caused by this problem?

Anything else I can capture from the failure that would help debug?

JamesH65 commented 3 years ago

vcdbg passes back debug information from the videocore firmware, so all the debug printed is from that. And its all closed source.

feacluster commented 3 years ago

I too have the same issue and am using the V2 camera from Amazon.

image

do-dat-thang commented 3 years ago

OK. Please let me know if there is anything else I can provide for debug or if you'd like me to try a potential fix/debug code.

elzorrofxt commented 3 years ago

I'm the original poster on the

Linking to the forum thread https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=304989

Thank you for testing. I'm reading from that that you're using the HQ camera, even though your raspistill line is for a 5MPix capture.

453e49bdd87325369b462b40e809d5f3187df21d does have an update to the IMX477 HQ camera driver, but that should only be for the high framerate mode to avoid image quality issues. It shouldn't affect stills captures at all, but at least it's something to investigate.

I'm the original poster (PiPiParty) of that thread, lemme know if there is any testing I can do, I've got the HQ camera.

do-dat-thang commented 3 years ago

So... I'm wondering: I see multiple forum discussions and git issues on very similar problems to this and all those just stop with no resolution. With the firmware being a closed component, there is no way to assist in debug and fix. Is there any way to know if there is anyone is even working on this and/or any progress? I believe this is the right place to ask this - if not, can anyone here point me in the right direction?

I'm asking because we have many rpi's running our application which is dependent on networking as well as the camera. The wireless networking appears to be much more stable on the latest releases so we want to be on the latest. We however don't feel comfortable with a mismatch of kernel/firmware/everything else.

naushir commented 3 years ago

I've been trying unsuccessfully to reproduce this for most of today. With a freshly run rpi-update and on both imx477 and imx219, I cannot get it to lockup if I run the following loop:

while true; do raspistill -t 1000 -o test.jpg; sleep 1; done

Edit: Perhaps I ought to remove the sleep from the loop. And without the sleep, it has failed!

naushir commented 3 years ago

Last line of vcdbg log msg:

4255553.603: mmal: fail_destroy: Timed out waiting to destroy ril.camera

I see no asserts or exceptions on the VC side.

dmesg error stack:


[ 4241.692259] ------------[ cut here ]------------
[ 4241.692297] WARNING: CPU: 0 PID: 12711 at drivers/firmware/raspberrypi.c:64 rpi_firmware_transaction+0xec/0x128
[ 4241.692309] Firmware transaction timeout
[ 4241.692319] Modules linked in: rfcomm bnep hci_uart btbcm bluetooth ecdh_generic ecc fuse 8021q garp stp llc rpivid_mem brcmfmac brcmutil sha256_generic vc4 v3d gpu_sched cfg80211 cec rfkill drm_kms_helper raspberrypi_hwmon bcm2835_codec(C) drm v4l2_mem2mem bcm2835_v4l2(C) bcm2835_isp(C) bcm2835_mmal_vchiq(C) videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common drm_panel_orientation_quirks videodev snd_bcm2835(C) mc vc_sm_cma(C) snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm snd_timer snd syscopyarea sysfillrect sysimgblt fb_sys_fops backlight uio_pdrv_genirq nvmem_rmem uio i2c_dev ip_tables x_tables ipv6
[ 4241.692957] CPU: 0 PID: 12711 Comm: kworker/0:0 Tainted: G         C        5.10.25-v7l+ #1408
[ 4241.692965] Hardware name: BCM2711
[ 4241.692981] Workqueue: events dbs_work_handler
[ 4241.692994] Backtrace: 
[ 4241.693019] [<c0b72ce8>] (dump_backtrace) from [<c0b7307c>] (show_stack+0x20/0x24)
[ 4241.693032]  r7:ffffffff r6:00000000 r5:60000013 r4:c12e6a3c
[ 4241.693047] [<c0b7305c>] (show_stack) from [<c0b7740c>] (dump_stack+0xcc/0xf8)
[ 4241.693063] [<c0b77340>] (dump_stack) from [<c0220bac>] (__warn+0xfc/0x114)
[ 4241.693075]  r10:dec05008 r9:00000009 r8:c09acd34 r7:00000040 r6:00000009 r5:c09acd34
[ 4241.693083]  r4:c0e9f12c r3:c1205094
[ 4241.693096] [<c0220ab0>] (__warn) from [<c0b73820>] (warn_slowpath_fmt+0xa4/0xd8)
[ 4241.693106]  r7:00000040 r6:c0e9f12c r5:c1205048 r4:c0e9f14c
[ 4241.693120] [<c0b73780>] (warn_slowpath_fmt) from [<c09acd34>] (rpi_firmware_transaction+0xec/0x128)
[ 4241.693131]  r9:c1a85340 r8:00000018 r7:00000000 r6:ffffff92 r5:c1a85340 r4:c1205048
[ 4241.693145] [<c09acc48>] (rpi_firmware_transaction) from [<c09ace2c>] (rpi_firmware_property_list+0xbc/0x170)
[ 4241.693155]  r7:c1205048 r6:dec05000 r5:00001000 r4:dec05024
[ 4241.693168] [<c09acd70>] (rpi_firmware_property_list) from [<c09acf50>] (rpi_firmware_property+0x70/0x118)
[ 4241.693179]  r10:c466934c r9:00030002 r8:00000018 r7:c1a85340 r6:c4f97d48 r5:0000000c
[ 4241.693186]  r4:c4669340
[ 4241.693204] [<c09acee0>] (rpi_firmware_property) from [<c07e6b78>] (raspberrypi_clock_property+0x54/0x7c)
[ 4241.693215]  r10:00000000 r9:00000000 r8:c22cfc40 r7:00000000 r6:2faf0800 r5:c4f97d70
[ 4241.693222]  r4:c1205048 r3:0000000c
[ 4241.693237] [<c07e6b24>] (raspberrypi_clock_property) from [<c07e6bec>] (raspberrypi_fw_get_rate+0x4c/0x70)
[ 4241.693245]  r5:00000000 r4:c1205048
[ 4241.693258] [<c07e6ba0>] (raspberrypi_fw_get_rate) from [<c07dd4c0>] (clk_recalc+0x44/0x84)
[ 4241.693266]  r4:c20ba000
[ 4241.693277] [<c07dd47c>] (clk_recalc) from [<c07dd7b0>] (__clk_recalc_rates+0x30/0x94)
[ 4241.693285]  r5:00000000 r4:c20ba000
[ 4241.693296] [<c07dd780>] (__clk_recalc_rates) from [<c07dd86c>] (clk_core_get_rate_recalc+0x58/0x5c)
[ 4241.693306]  r7:00000000 r6:c1205048 r5:23c34600 r4:c20ba000
[ 4241.693317] [<c07dd814>] (clk_core_get_rate_recalc) from [<c07dd898>] (clk_get_rate+0x28/0x38)
[ 4241.693325]  r5:23c34600 r4:c22cfc40
[ 4241.693341] [<c07dd870>] (clk_get_rate) from [<c0978188>] (dev_pm_opp_set_rate+0x70/0x588)
[ 4241.693349]  r5:23c34600 r4:c19a5400
[ 4241.693363] [<c0978118>] (dev_pm_opp_set_rate) from [<c0983078>] (set_target+0x38/0x3c)
[ 4241.693374]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:c1406e74 r5:c1205048
[ 4241.693381]  r4:c19a5600
[ 4241.693394] [<c0983040>] (set_target) from [<c097dd7c>] (__cpufreq_driver_target+0x1bc/0x540)
[ 4241.693406] [<c097dbc0>] (__cpufreq_driver_target) from [<c098110c>] (od_dbs_update+0xc4/0x170)
[ 4241.693417]  r10:00000000 r9:c338ba00 r8:c401bb40 r7:c338b200 r6:c338ba00 r5:c338b200
[ 4241.693424]  r4:c19a5600
[ 4241.693438] [<c0981048>] (od_dbs_update) from [<c0982370>] (dbs_work_handler+0x3c/0x64)
[ 4241.693448]  r9:00000000 r8:c131d3dc r7:c19a5600 r6:c338b204 r5:00000000 r4:c338b238
[ 4241.693463] [<c0982334>] (dbs_work_handler) from [<c023dc18>] (process_one_work+0x250/0x5a0)
[ 4241.693473]  r9:00000000 r8:00000000 r7:eff15a00 r6:eff12680 r5:c4649680 r4:c338b238
[ 4241.693486] [<c023d9c8>] (process_one_work) from [<c023dfc8>] (worker_thread+0x60/0x5c4)
[ 4241.693496]  r10:eff12680 r9:c1203d00 r8:eff12698 r7:00000008 r6:eff12680 r5:c4649694
[ 4241.693504]  r4:c4649680
[ 4241.693517] [<c023df68>] (worker_thread) from [<c0245a50>] (kthread+0x170/0x174)
[ 4241.693528]  r10:c4f1be74 r9:c4649680 r8:c023df68 r7:c4f96000 r6:00000000 r5:c4669900
[ 4241.693535]  r4:c4641e40
[ 4241.693547] [<c02458e0>] (kthread) from [<c02000ec>] (ret_from_fork+0x14/0x28)
[ 4241.693555] Exception stack(0xc4f97fb0 to 0xc4f97ff8)
[ 4241.693565] 7fa0:                                     00000000 00000000 00000000 00000000
[ 4241.693575] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 4241.693585] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 4241.693595]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c02458e0
[ 4241.693602]  r4:c4669900
[ 4241.693612] ---[ end trace 57eb65a4a490a110 ]---
``
pelwell commented 3 years ago

What state is the VPU in at that point? Do vcgencmds work? What do "sudo vcdbg malloc" and "sudo vcdbg reloc" report?

naushir commented 3 years ago

The VPU was in a locked state, but curiously only after I managed to do a vcdbg log msg.

6by9 commented 3 years ago

I've now reproduced too on a CM4 with 477. vcdbg and vcgencmd are working for me. Heaps look normal, although we still have some remnants of camplus/cdi still allocated. Naush hasn't got an XJTAG with him. I have, but no debugger set up due to changing laptop.

Something in camplus_close is likely taking longer for some odd reason, and we're hitting the 2s component destroy timeout.

pelwell commented 3 years ago

"log msg". "malloc" and "reloc" don't rely on cooperation from the VPU.

naushir commented 3 years ago

"log msg". "malloc" and "reloc" don't rely on cooperation from the VPU.

My whole system locked up at that point. Had to power cycle to get back.

6by9 commented 3 years ago

Reproduced with debugger. Looks like we have a potential deadlock with restoring the I2C pinmuxing on close in power_pad_control.

pelwell commented 3 years ago

Nice work - surely we are but moments away from a fix...

6by9 commented 3 years ago

Hmm, don't hold your breath there. Internal issue raised. (vc4#46)

6by9 commented 3 years ago

Patch merged to the internal repo - it should be in the next rpi-update release.

popcornmix commented 3 years ago

Fix is in latest rpi-update firmware.

do-dat-thang commented 3 years ago

Way awesome guys! Assuming I can just rpi-update to grab it, I'll do that later today & exercise it a bit. Thanks!!

elzorrofxt commented 3 years ago

Great news, thanks all!!

do-dat-thang commented 3 years ago

Last night, I did the rpi-update and it picked up c0cf93a133dab106439c208f03d32155bc19e432 plus 5.10.25-v7l+... still running fine after 12 hours. Before, the failure usually occurred after just a few minutes. This IS great news!

After a bit more run time, do I close this issue or is that done by someone else?

pelwell commented 3 years ago

Close it when you feel ready.

feacluster commented 3 years ago

When attempting to run "sudo rpi-update" on my Pi4+, I get this message:

Partition size 255M may not be sufficient for new Pi4 files This could result in a system that will not boot. 256M FAT partition is recommended. Ensure you have a backup if continuing.

Not sure if I can safely ignore this warning? The machine is located a half hour drive away from me. So I would rather not have to go there if it fails to boot :-)

pelwell commented 3 years ago

The warning is aimed at old Raspbian images with 56MB boot partitions. I'm sure with 255MB you'll be fine, but you can always check the free space with the command df /boot.

elzorrofxt commented 3 years ago

I'm at 36 hours, looking good so far.

do-dat-thang commented 3 years ago

Been running great for 3 days now. Thanks again for the great work! Closing issue.

feacluster commented 3 years ago

I did the firmware update and got 5.10.25-v8+ . However, the camera still failed after one day after reboot. The error I get is:

error: failed to set camera to idle state
Perhaps another program is using camera, otherwise you need to reboot this pi

This might be a different issue from what was discussed here.

The messages I get from " sudo vcdbg log assert" are:


2387546.232: assert( retcode == 0 ) failed; ../../../../../middleware/openmaxil/                                                                              components/camera.c::load_and_open_cdi line 12008 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2387546.293: assert( id->cdi!=NULL && id->cdi_handle!=NULL ) failed; ../../../..                                                                              /../middleware/openmaxil/components/camera.c::open_camplus line 11343 rev edf2e9                                                                              c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2447709.600: assert( retcode == 0 ) failed; ../../../../../middleware/camplus/cd                                                                              i/cdi_camera.c::open_camera_peripheral line 645 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2447709.675: assert( success == 0 ) failed; ../../../../../middleware/camplus/cd                                                                              i/cdi_camera.c::cdi_camera_change_state line 1401 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2447709.765: assert( *retcode == 0 ) failed; ../../../../../middleware/camplus/c                                                                              di/cdi_camera.c::cdi_camera_open line 601 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2447710.940: assert( retcode == 0 ) failed; ../../../../../middleware/openmaxil/                                                                              components/camera.c::load_and_open_cdi line 12008 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2447711.045: assert( id->cdi!=NULL && id->cdi_handle!=NULL ) failed; ../../../..                                                                              /../middleware/openmaxil/components/camera.c::open_camplus line 11343 rev edf2e9                                                                              c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2507860.963: assert( retcode == 0 ) failed; ../../../../../middleware/camplus/cd                                                                              i/cdi_camera.c::open_camera_peripheral line 645 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2507861.074: assert( success == 0 ) failed; ../../../../../middleware/camplus/cd                                                                              i/cdi_camera.c::cdi_camera_change_state line 1401 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2507861.153: assert( *retcode == 0 ) failed; ../../../../../middleware/camplus/c                                                                              di/cdi_camera.c::cdi_camera_open line 601 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2507862.399: assert( retcode == 0 ) failed; ../../../../../middleware/openmaxil/                                                                              components/camera.c::load_and_open_cdi line 12008 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
2507862.884: assert( id->cdi!=NULL && id->cdi_handle!=NULL ) failed; ../../../..                                                                              /../middleware/openmaxil/components/camera.c::open_camplus line 11343 rev edf2e9                                                                              c
vcdbg_ctx_get_dump_stack: dump_stack failed
6by9 commented 3 years ago
2507860.963: assert( retcode == 0 ) failed; ../../../../../middleware/camplus/cd                                                                              i/cdi_camera.c::open_camera_peripheral line 645 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed

is almost certainly that something else is using the CSI2 receiver. It's possible that a previous instance has got "stuck", but that shouldn't be the case with the fix created for this issue. Please raise a new issue with all the standard information.

do-dat-thang commented 3 years ago

@feacluster , If you had already hit this problem before you put the new firmware on, a regular reboot attempt (reboot/shutdown -r) seems to not really reboot the pi. You may need to power cycle or see the reboot sequence in the link @6x9 just posted (#4047). That worked for me. "uname -r" will let you know if the new kernel was loaded - IDK how to tell the currently loaded firmware level.

simonyangme commented 3 years ago

I'm still seeing this issue on a Pi 3B+ with the HQ camera. I'm running kernel 5.10.25-v7+ and used rpi-update with the hash listed upthread. vcgencmd version shows me the following:

Feb 25 2021 12:13:12 
Copyright (c) 2012 Broadcom
version 564e5f9b852b23a330b1764bcf0b2d022a20afd0 (clean) (release) (start_x)

Just wanted to check that this is correct and isn't due to an outdated firmware as the date it shows seems old. Running rpi-update c0cf93a133dab106439c208f03d32155bc19e432 gives me *** Your firmware is already up to date

feacluster commented 3 years ago

@simonyangme I am just confirming that I too still have this issue on one of my seven cameras despite upgrading to latest kernel . I also tried doing a full power cycle by unplugging as suggested earlier. I am using a Pi 4 , not 3B+.

I don't believe it is a hardware issue because the camera works fine for several hours. Often upto a full day. As a workaround I have a cronjob to reboot everyday at midnight. I suppose I can try replacing camera. But it is in a somewhat inaccessible location so would rather not :-)

popcornmix commented 3 years ago

@simonyangme that firmware is too old. The suggested hash https://github.com/Hexxeh/rpi-firmware/commit/c0cf93a133dab106439c208f03d32155bc19e432 should be around March 26. I assume you've rebooted after running rpi-update?

To force an rpi-update use:

sudo rm /boot/.firmware_revision
sudo rpi-update <hash>

although there's no need to specify a specific older firmware, all subsequent versions will contain this fix.

simonyangme commented 3 years ago

@popcornmix That seemed to do the trick!

Is there any reason this is required? Simply doing rpi-update c0cf93a133dab106439c208f03d32155bc19e432 from an older version (kernel+firmware) did not seem to pull in the right firmware. Is this expected behavior?

popcornmix commented 3 years ago

/boot/.firmware_revision just contains the hash from the last run on rpi-update and if you ask for the same thing again it says it's up to date. Only reason I can think of it failing is if you updated then ran it again without rebooting, or updated and then downgraded firmware (e.g. by running apt upgrade) in a way rpi-update is unaware of.

simonyangme commented 3 years ago

I'm now running:

Mar 26 2021 16:35:37 
Copyright (c) 2012 Broadcom
version edf2e9c318863999c97c50cdb74eee235ede3af5 (clean) (release) (start_x)

and still see the same issue when I switch to full frame mode and capture an image. rpi-issue.txt

simonyangme commented 3 years ago

With a bit more testing, the failure seems to happen randomly on the first capture after boot on my system. If it happens, then I am forced to power cycle to recover. If it does not happen, then capturing 5+ images does not seem to trigger it.

simonyangme commented 3 years ago

FWIW I've tried going back to dea7234943c604462e476a8afc13c587418e8709 kernel and firmware, and have not seen the issue since. I have not had a chance to further narrow down what might be contributing to the problem, but got the following from vcdbg log assert when it does fail:

096238.052: assert( IS_ALIAS_PERIPHERAL(c->dst_addr) || (c->xfer_info & DMA0_TI_DEST_IGNORE_SET) || IS_ALIAS_NOT_L1_ALLOCATING(c->dst_addr) ) failed; ../../../../../helpers/dmalib/dmalib.c::dma_chain_start line 482 rev edf2e9c
vcdbg_ctx_get_dump_stack: dump_stack failed

Please let me know if I should open a new issue for this and if there's any other info I can provide to help track this down.

feacluster commented 3 years ago

Unfortunately the issue seems to be happening again. I have seven of these v2.1 cameras but two of them have started showing the same errors reported in this thread. See code snippets below: As a workaround I have a cron job that reboots the pi every day at midnight. The camera will work ok for 1-2 days but then fail with:

pi@court5:~ $  raspistill -w 640 -h 480 -q 50 -o /var/www/html/share/court5.jpg
tvservice-client: Failed to connect to TV service: -1
mmalipc: mmal_vc_init_fd: could not open vchiq service
mmal: mmal_vc_component_create: failed to initialise mmal ipc for 'vc.camera_info' (7:EIO)
mmal: mmal_component_create_core: could not create component 'vc.camera_info' (7)
mmal: Failed to create camera_info component
mmalipc: mmal_vc_init_fd: could not open vchiq service
mmal: mmal_vc_component_create: failed to initialise mmal ipc for 'vc.ril.camera' (7:EIO)
mmal: mmal_component_create_core: could not create component 'vc.ril.camera' (7)
mmal: Failed to create camera component
mmal: main: Failed to create camera component
mmal: Camera is not enabled in this build. Try running "sudo raspi-config" and ensure that "camera" has been enabled
pi@court5:~ $ uname -r
5.10.17-v8+
pi@court5:~ $ sudo vcdbg log assert
debug_sym: vc_mem_copy: Unable to open '/dev/fb0': No such file or directory(2)

110926.893: assert( success == 0 ) failed; ../../../../../middleware/camplus/cdi/cdi_camera.c::cdi_camera_change_state line 1401 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
110926.967: assert( *retcode == 0 ) failed; ../../../../../middleware/camplus/cdi/cdi_camera.c::cdi_camera_open line 601 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
110928.112: assert( retcode == 0 ) failed; ../../../../../middleware/openmaxil/components/camera.c::load_and_open_cdi line 12008 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
110928.188: assert( id->cdi!=NULL && id->cdi_handle!=NULL ) failed; ../../../../../middleware/openmaxil/components/camera.c::open_camplus line 11343 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
111006.231: assert( retcode == 0 ) failed; ../../../../../middleware/camplus/cdi/cdi_camera.c::open_camera_peripheral line 645 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
111006.304: assert( success == 0 ) failed; ../../../../../middleware/camplus/cdi/cdi_camera.c::cdi_camera_change_state line 1401 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
111006.367: assert( *retcode == 0 ) failed; ../../../../../middleware/camplus/cdi/cdi_camera.c::cdi_camera_open line 601 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
111007.479: assert( retcode == 0 ) failed; ../../../../../middleware/openmaxil/components/camera.c::load_and_open_cdi line 12008 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
111007.536: assert( id->cdi!=NULL && id->cdi_handle!=NULL ) failed; ../../../../../middleware/openmaxil/components/camera.c::open_camplus line 11343 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
115859.429: assert( temps->last_tone_mapping_params_valid ) failed; ../../../../../middleware/ISP/tuner/isp_tuner_brcm_agc.c::isp_tuner_brcm_agc_switch_mode line 2745 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
115946.712: assert( pass == ISP_TUNER_PASS_NONE ) failed; ../../../../../middleware/ISP/tuner/isp_tuner_brcm_focus.c::master_focus_post_isp_update line 1912 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
116092.199: assert( pass == ISP_TUNER_PASS_NONE ) failed; ../../../../../middleware/ISP/tuner/isp_tuner_brcm_focus.c::master_focus_post_isp_update line 1912 rev d7f29d9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------