raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.14k stars 4.99k forks source link

I2C issues with 5.4 + TC358743 #3602

Closed mdevaev closed 4 years ago

mdevaev commented 4 years ago

Describe the bug After updating the kernel from version 4.19.118 to 5.4.35 an attempt to get an image from the tc358743 device twice in a row (reopen device file) results to I2C timeouts. In some cases, this leads to a hang (see log1), sometimes it causes errors when working with the MMC card and a reboot (see log2).

UPD: The first message was not entirely accurate. The problem occur when the first reading process is interrupted. That is, I run yavta, press Ctrl+C and immediately got a dead kernel. I played around with yavta a bit and found out that the problem occurs either when closing /dev/video0 or when executing ioctl VIDIOC_STREAMOFF. I think the tc358743 driver is trying to command something over I2C, and everything stops working.

/UPD

[   79.678547] ------------[ cut here ]------------
[   79.678554] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   79.678569] tc358743 10-000f: i2c_wr: writing register 0x4 from 0xf failed
[   79.683241] WARNING: CPU: 3 PID: 34 at drivers/firmware/raspberrypi.c:63 rpi_firmware_transaction+0xe8/0x124
[   79.705713] Firmware transaction timeout
[   79.705715] Modules linked in: usb_f_mass_storage usb_f_hid usb_f_acm u_serial btsdio bluetooth ecdh_generic ecc brcmfmac brcmutil cfg80211 raspberrypi_hwmon hwmon i2c_mux_pinctrl i2c_mux bcm2835_unicam i2c_bcm2835 iproc_rng200 rng_core bcm2835_codec(C) bcm2835_v4l2(C) bcm2835_isp(C) bcm2835_mmal_vchiq(C) v4l2_mem2mem videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common rpivid_mem uio_pdrv_genirq uio sch_fq_codel snd_bcm2835(C) snd_pcm snd_timer snd tc358743 v4l2_dv_timings v4l2_fwnode videodev mc cec libcomposite dwc2 udc_core drm drm_panel_orientation_quirks ip_tables x_tables ipv6 nf_defrag_ipv6
[   79.766074] CPU: 3 PID: 34 Comm: kworker/3:1 Tainted: G         C        5.4.35-1-ARCH #1
[   79.774292] Hardware name: BCM2711
[   79.777717] Workqueue: events dbs_work_handler
[   79.782192] [<c0211424>] (unwind_backtrace) from [<c020c6fc>] (show_stack+0x10/0x14)
[   79.789977] [<c020c6fc>] (show_stack) from [<c0ccc638>] (dump_stack+0x94/0xb4)
[   79.806238] [<c0ccc638>] (dump_stack) from [<c022ceb8>] (__warn+0xd0/0xf8)
[   79.817766] [<c022ceb8>] (__warn) from [<c022d29c>] (warn_slowpath_fmt+0x98/0xc4)
[   79.834301] [<c022d29c>] (warn_slowpath_fmt) from [<c0b36468>] (rpi_firmware_transaction+0xe8/0x124)
[   79.852463] [<c0b36468>] (rpi_firmware_transaction) from [<c0b36550>] (rpi_firmware_property_list+0xac/0x168)
[   79.871582] [<c0b36550>] (rpi_firmware_property_list) from [<c0b3666c>] (rpi_firmware_property+0x60/0x108)
[   79.890551] [<c0b3666c>] (rpi_firmware_property) from [<c0928494>] (raspberrypi_clock_property+0x48/0x78)
[   79.909488] [<c0928494>] (raspberrypi_clock_property) from [<c09285f8>] (raspberrypi_fw_set_rate+0x44/0xb8)
[   79.928821] [<c09285f8>] (raspberrypi_fw_set_rate) from [<c0921c94>] (clk_change_rate+0xe0/0x558)
[   79.947361] [<c0921c94>] (clk_change_rate) from [<c0922284>] (clk_core_set_rate_nolock+0x178/0x1a0)
[   79.966220] [<c0922284>] (clk_core_set_rate_nolock) from [<c09222dc>] (clk_set_rate+0x30/0x88)
[   79.984877] [<c09222dc>] (clk_set_rate) from [<c0b03da8>] (dev_pm_opp_set_rate+0x364/0x460)
[   80.003253] [<c0b03da8>] (dev_pm_opp_set_rate) from [<c0b0d0bc>] (set_target+0x2c/0x54)
[   80.021370] [<c0b0d0bc>] (set_target) from [<c0b07e08>] (__cpufreq_driver_target+0x220/0x534)
[   80.040342] [<c0b07e08>] (__cpufreq_driver_target) from [<c0b0b32c>] (od_dbs_update+0xb4/0x160)
[   80.059658] [<c0b0b32c>] (od_dbs_update) from [<c0b0c4c4>] (dbs_work_handler+0x2c/0x58)
[   80.078284] [<c0b0c4c4>] (dbs_work_handler) from [<c02489ec>] (process_one_work+0x1f0/0x588)
[   80.097449] [<c02489ec>] (process_one_work) from [<c0248dd0>] (worker_thread+0x4c/0x528)
[   80.116348] [<c0248dd0>] (worker_thread) from [<c024ec28>] (kthread+0x128/0x154)
[   80.134600] [<c024ec28>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[   80.152846] Exception stack(0xef2a9fb0 to 0xef2a9ff8)
[   80.163429] 9fa0:                                     00000000 00000000 00000000 00000000
[   80.182818] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   80.202646] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   80.215376] ---[ end trace 91bfd0c131224965 ]---
[   80.226095] raspberrypi-clk firmware-clocks: Failed to change pllb frequency: -110
[   80.718577] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   80.730213] tc358743 10-000f: i2c_rd: reading register 0x2 from 0xf failed
[   81.758599] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   81.770193] tc358743 10-000f: i2c_wr: writing register 0x2 from 0xf failed
[   82.798624] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   82.810166] tc358743 10-000f: i2c_wr: writing register 0x2 from 0xf failed
[   83.838648] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   83.850108] tc358743 10-000f: i2c_wr: writing register 0x14c from 0xf failed
[   84.878670] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   84.890186] tc358743 10-000f: i2c_wr: writing register 0x150 from 0xf failed
[   85.918697] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   85.930211] tc358743 10-000f: i2c_wr: writing register 0x210 from 0xf failed
[   86.958710] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   86.970186] tc358743 10-000f: i2c_wr: writing register 0x214 from 0xf failed
[   87.998734] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   88.010264] tc358743 10-000f: i2c_wr: writing register 0x218 from 0xf failed
[   89.038749] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   89.050252] tc358743 10-000f: i2c_wr: writing register 0x21c from 0xf failed
[   90.078765] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   90.090261] tc358743 10-000f: i2c_wr: writing register 0x220 from 0xf failed
[   91.118779] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   91.130418] tc358743 10-000f: i2c_wr: writing register 0x224 from 0xf failed
[   92.158791] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   92.170372] tc358743 10-000f: i2c_wr: writing register 0x228 from 0xf failed
[   93.198809] i2c-bcm2835 fe205000.i2c: i2c transfer timed out
[   93.210184] tc358743 10-000f: i2c_wr: writing register 0x22c from 0xf failed

To reproduce Set up Auvidea B101 with kernel 4.4.35, run any capture software that supports DV-timings (https://github.com/pikvm/ustreamer, yavta, etc) and stop it.

Expected behaviour No crashes when closing the device file

Actual behaviour Subj

System

Logs Attached kernel log from tty. Just I2C errors: log1.txt

MMC crash: log2.txt

pelwell commented 4 years ago

That's really helpful, as it means I can forget the mux locking and concentrate on the TC358743 again. But if you can attach a logic analyser then it is likely to shorten the time to a solution.

mdevaev commented 4 years ago

This is a bit difficult, because I personally don't have a logic analyzer, but a colleague 200 km away has one. We will try to organize this process remotely. I think we'll figure it out in an hour.

pelwell commented 4 years ago

If it doesn't test your friendship with the colleague too much then that sounds like a good option, but for now I'll assume that it doesn't work out.

mdevaev commented 4 years ago

Well, we can always try it. I have carefully recorded all my actions, so there should be no problems with reproducing the test conditions. Unless he has a Chinese replica of B101, although it differs slightly from the original. Schematically, I mean. From the driver's point of view, they look the same.

6by9 commented 4 years ago

Can you try again with v4l2-ctl instead of yavta please? yavta uses a load of VPU stuff. The fact that you're getting errors setting clocks does imply that you've crashed the VPU.

mdevaev commented 4 years ago

Each v4l2-ctl launch only prints "New timings found". The kernel writes a trace, but it doesn't hang. I can make a call several times, and each time there will be the same stacktraces. I recorded a single call and the I2C trace.

screenlog.txt trace_v4l2.txt

mdevaev commented 4 years ago

Okay, I couldn't do it fast enough, so we'll try the logic analyzer tomorrow.

pelwell commented 4 years ago

Up to now you've always spoken of exiting yavta with Ctrl-C. I notice that there is a command line option to specify the number of frames - --capture=<nframes>. Does specifying a frame count, and therefore hopefully getting an orderly shutdown rather as the result of a signal, change the behaviour?

If so, can I get an i2c trace of it?

mdevaev commented 4 years ago

All traces starting from this https://github.com/raspberrypi/linux/issues/3602#issuecomment-626727238 I performed with a limit on the number of frames, so that the result was the same. Each time there were exactly 5 of them.

6by9 commented 4 years ago

Can you drop the MMAL side out of yavta please? ./yavta --capture=1000 -n 3 -f UYVY -T you won't get a preview, but it should capture frames and print timestamps to the screen.

The other thought is dropping out the more recent changes to bcm2835-unicam. I suspect that means building the kernel for yourself having reverted: f79b211 media: bcm2835: unicam: Fix uninitialized warning 504574e media: bcm2835-unicam: Fix reference counting in unicam_open 563c4ad media: bcm2835-unicam: Do not stop streaming in unicamrelease 51a5955 media: bcm2835-unicam: Add support for VIDIOC[S|G]_SELECTION 0ac8108 media: bcm2835-unicam: Re-fetch mbus code from subdev on a g_fmt call 4a7398e media: bcm2835-unicam: Add support for the FRAME_SYNC event 7410f0f media: bcm2835-unicam: Disable event-related ioctls on metadata node c9e2f8f media: bcm2835-unicam: Use dummy buffer if none have been queued 3ccd6c4 media: bcm2835-unicam: Add embedded data node. e73ab21 media: bcm2835-unicam: Add support for mulitple device nodes.

Build instructions are at https://www.raspberrypi.org/documentation/linux/kernel/building.md

pelwell commented 4 years ago

I'm in the process of modifying the overlay to use i2c-gpio instead of the hardware interface (to rule out another factor), but it's not co-operating yet.

mdevaev commented 4 years ago

Results of ./yavta --capture=1000 -n 3 -f UYVY -T: yavta.txt screenlog.txt trace.txt

About the kernel: do I need to start doing something, or wait for the overlay? Also, no ready-made builds?

6by9 commented 4 years ago

So the reduced yavta died as well? It's odd as the I2C comms appear to pick up again at 270.348485 according to the i2c trace. A bundle of -ETIMEDOUT, a couple of EIO, and then back to normal. Did you try a second run?

Sorry, as we're in the process of upstreaming the driver we waited until all the cleanups could be done, and then merged it in one swoop onto 5.4. AFAIK There are no intermediate builds (there certainly aren't in rpi-update). Wait for Phil's updated overlay first.

I'll head into the office later to scoop up all the B101's I can find in the hope that one might work. I think I've had 3 over time, so one might play ball. I'll get a couple more on order too (probably the Chinese clones as they're half the price).

pelwell commented 4 years ago

Sorry about the delay - I hit the weirdest issue applying the overlay at runtime. A slight reshuffle has avoided it, but it remains a problem for another day.

Anyway, try this:

// SPDX-License-Identifier: GPL-2.0-only
// Definitions for Toshiba TC358743 HDMI to CSI2 bridge on VC I2C bus
/dts-v1/;
/plugin/;

/{
    compatible = "brcm,bcm2835";

    fragment@0 {
        target-path = "/";
        __overlay__ {
            i2c_gpio: i2c-gpio-tc358743@0 {
                compatible = "i2c-gpio";
                gpios = <&gpio 44 0 /* sda */
                     &gpio 45 0 /* scl */
                    >;
                i2c-gpio,delay-us = <2>;        /* ~100 kHz */
                #address-cells = <1>;
                #size-cells = <0>;

                status = "okay";

                tc358743@0f {
                    compatible = "toshiba,tc358743";
                    reg = <0x0f>;
                    status = "okay";

                    clocks = <&tc358743_clk>;
                    clock-names = "refclk";

                    port {
                        tc358743: endpoint {
                            remote-endpoint = <&csi1_ep>;
                            clock-lanes = <0>;
                            clock-noncontinuous;
                            link-frequencies =
                                /bits/ 64 <486000000>;
                        };
                    };
                };
            };
        };
    };

    fragment@1 {
        target-path = "/";
        __overlay__ {
            tc358743_clk: bridge-clk {
                compatible = "fixed-clock";
                #clock-cells = <0>;
                clock-frequency = <27000000>;
            };
        };
    };

    fragment@2 {
        target = <&csi1>;
        __overlay__ {
            status = "okay";

            port {
                csi1_ep: endpoint {
                    remote-endpoint = <&tc358743>;
                };
            };
        };
    };

    fragment@3 {
        target = <&tc358743>;
        __overlay__ {
            data-lanes = <1 2>;
        };
    };

    fragment@4 {
        target = <&tc358743>;
        __dormant__ {
            data-lanes = <1 2 3 4>;
        };
    };

    __overrides__ {
        4lane = <0>, "-3+4";
        link-frequency = <&tc358743>,"link-frequencies#0";
    };
};
mdevaev commented 4 years ago

@6by9 The kernel freezes (only prints messages to the serial console and does not respond to input), but after a few minutes it comes to, and I get to run yavta again. I attached two traces: taken after the first launch and after the second.

screenlog.txt yavta.txt trace1.txt trace2.txt

pelwell commented 4 years ago

[ updated the 4lane parameter with the modified fragment ]

mdevaev commented 4 years ago

@pelwell my logs were for the previous comment, now I will try your overlay.

6by9 commented 4 years ago

Prints messages to the serial console

Do you have an HDMI display attached? (Thought process is whether it is the display side deciding it has nothing to do and shutting clocks down)

The other thought that I have remembered is that earlier versions of this driver height aligned their buffers to a multiple of 16. If you're capturing 1080p50 (as it appears you are), then 1080 is not a multiple of 16, and would previously have been rounded up to 1088. We also have 192021080 = 4147200, or 1012.5 pages. 192021088 would have been 4177920, or 1020 pages. Is it possible a page table is going adrift?

There is a way to get V4L2 to allocate larger buffers than that specified in sizeimage. I'll see if I can dig them out and make it work.

mdevaev commented 4 years ago

Do you have an HDMI display attached?

No, sorry. I don't have an adapter.

Okay, here we go. Here the dmesg about the device:

[    5.364606] gpio-44 (i2c-gpio-tc358743@0): enforced open drain please flag it properly in DT/ACPI DSDT/board file
[    5.364705] gpio-45 (i2c-gpio-tc358743@0): enforced open drain please flag it properly in DT/ACPI DSDT/board file
[    5.370188] i2c-gpio i2c-gpio-tc358743@0: using lines 44 (SDA) and 45 (SCL)
[    6.373303] tc358743 11-000f: tc358743 found @ 0x1e (i2c-gpio-tc358743@0)

On v4l2-ctl --stream-mmap=3 --stream-to=/dev/null --stream-count=100 only "New timings found". Also I still get stacktraces to the console, but now the kernel doesn't freeze.

When I try to use ./yavta --capture=1000 -n 3 -f UYVY -T /dev/video0 after this, I get a hung process that doesn't respond to ctrl+c. But the kernel doesn't freeze.

pi@raspberrypi:~$ ps aux | grep yavta
pi         601  0.8  0.0      0     0 pts/0    D+   11:29   0:00 [yavta]

screenlog.txt terminal.txt trace.txt

I also tried my ustreamer which uses the V4L2 API. I tried starting and stopping this several times. This worked and even gave a picture, but it seems to have broken the definition of DV timings. I kept getting a resolution of 640x480 with an incorrect picture, although I expected 1920x1080. After a few launches, the core broke down.

./ustreamer --host 0.0.0.0 \
    --format=uyvy \ # Device input format
    --encoder=cpu \ # CPU encoding
    --workers=3 \ # Maximum workers
    --persistent \ # Don't re-initialize device on timeout (for example when HDMI cable was disconnected)
    --dv-timings \ # Use DV-timings
    --drop-same-frames=30 \ # Save the traffic

image

screenlog.txt The trace could not be maked because the kernel died.

Ad break: I wrote this thing specifically for B101. It is able to stream MJPG very quickly, can use multithreaded compression and/or RPi's GPU using OMX. I managed to achieve 24 fps for FullHD. Well, if someone needs it.

pelwell commented 4 years ago

Could I get a trace of a 5 frame capture with yavta and the i2c-gpio-based overlay?

pelwell commented 4 years ago

I mean, without running v4l2-ctl first.

mdevaev commented 4 years ago

screenlog.txt trace.txt

pelwell commented 4 years ago

That appears to have shut down successfully, or have I missed something?

6by9 commented 4 years ago

@naushir How sure are we of Unicam stopping at the end address? You changed the mode it was in with your recent patches from stop at end of buffer to wrap around? (Added UNICAM_IBOB). Can you remind me why? If Unicam has been configured for 640x480 (because that's what has been reported), but the frame is actually 1920x1080, are we sure that it doesn't check the end address at the end of the current line, which is now going to be off the end of the buffer.

mdevaev commented 4 years ago

@pelwell yes. The problem has become irregular. After 10 or 20 launches, I may get a crash.

pelwell commented 4 years ago

Is that a change since switching to the i2c-gpio overlay?

mdevaev commented 4 years ago

Yes

pelwell commented 4 years ago

If you do manage to capture a trace during a crash then I'd like to see it.

naushir commented 4 years ago

@naushir How sure are we of Unicam stopping at the end address? You changed the mode it was in with your recent patches from stop at end of buffer to wrap around? (Added UNICAM_IBOB). Can you remind me why?

We had used warp around in the firmware driver in all our production code, I do not recall it causing us any problems. Wrapping was also used for the N8 RMI stuff, but the peripheral had changed a bit since then.

We are also currently using the wrapping behaviour by spinning in a dummy buffer (not sized to the image) if we have no buffers queued by userland - this should also go wrong if we have a problem in Unicam with IBOB.

If I recall, I switched to IBOB because (ironically) I was seeing problems the original behaviour where I suspected it was writing past the end of the buffer.

If Unicam has been configured for 640x480 (because that's what has been reported), but the frame is actually 1920x1080, are we sure that it doesn't check the end address at the end of the current line, which is now going to be off the end of the buffer.

I would have expected it to do the address test on AXI burst granularity and not lines, but I cannot verify this.

mdevaev commented 4 years ago

@pelwell I tried to do this, but every time I try, the kernel freezes completely, so I can't get a dump. I can only give you stacktrace. screenlog.txt

pelwell commented 4 years ago

The stacktrace shows that it's stuck in unicam_stop_streaming, and it might even be doing the same I2C write. I'm surprised it isn't showing up as a timeout, but that could be something to do with it being a regular output rather than an open-drain output.

6by9 commented 4 years ago

I have been to the office and have one B101 that I suspect will be dead, but also one dev board that I have hope for. Hopefully I'll be able to reproduce this now.

6by9 commented 4 years ago

It looks like I have a result. One of my dead B101s was the FFC kinked and broken. Replaced the FFC (why does it have to be contacts on the same side, when all Pi cameras and front/rear?!) and I've just streamed from it at 720p60. Ctrl-C and it has locked up!

Now I can investigate it!

mdevaev commented 4 years ago

why does it have to be contacts on the same side, when all Pi cameras and front/rear

As far as I remember, Auvidea did this for compatibility with some of its other boards. Note that Chinese copies use the usual loops from RPi cameras.

Now I can investigate it!

Great!

mdevaev commented 4 years ago

Let me know if you still need the logic analyzer output.

6by9 commented 4 years ago

After a day of two of us digging into I2C issues, it turned out not to be. The Unicam shutdown procedure in unicam_stop_streaming is wrong in that it stops servicing all interrupts, and then tells the source and hardware to stop streaming. Any interrupt that occurs between the two causes a lockup :-(

Why it doesn't happen on a Pi3 is a bit of a mystery. There is a difference in the I2C config there, but I don't see why that would avoid the bug in the Unicam driver. This is new code in 5.4 (for libcamera support), so that explains why it affects 5.4 but not 4.19.

The fix is to drop the check for streaming or not in unicam_isr (https://github.com/raspberrypi/linux/blob/rpi-5.4.y/drivers/media/platform/bcm2835/bcm2835-unicam.c#L810). I'll sort out a PR, and it should be in the next rpi-update build.

pelwell commented 4 years ago

But a bonus is that there was a potential I2C problem we'd not spotted, and probably wouldn't have without digging into this.

mdevaev commented 4 years ago

Then I just have to wait for the update. Please let me know when it is ready, and I will test it immediately.

By the way, can I ask one quick question not related to this issue, but related to B101? I noticed that this resolutions don't work correctly: 720x480 or 720x576 (I don't remember which one is available) from EDID at yavta. The image fast jumps up and down. A more minor problem is observed at the 800x600 resolution: a line of green pixels flickers at the very bottom of the screen.

Can this have something to do with the unicam buffers you wrote about above? The problem is observed on all kernels.

6by9 commented 4 years ago

By the way, can I ask one quick question not related to this issue, but related to B101? I noticed that this resolutions don't work correctly: 720x480 or 720x576 (I don't remember which one is available) from EDID at yavta. The image fast jumps up and down. A more minor problem is observed at the 800x600 resolution: a line of green pixels flickers at the very bottom of the screen.

No, I suspect this is the FIFO trigger level in the TC358743.

You have HDMI data coming in at whatever pixel rate is determined by the mode. Data out over CSI2 is at a fixed 972Mbit/s/lane, but with a dynamically switchable number of lanes between 1 and 4. The data is read into a FIFO, and one of the register settings is the fill level on that FIFO for when to start transmitting out over CSI2. Too high and the FIFO overflows on high input pixel rates. Too low and the FIFO underflows on low input pixel rates.

Ideally the value should be dynamically adjusted, but the calculations behind it are pretty horrid, and trying to make it cover all resolutions hasn't been addressed. Previous users only worried about 720p60 (over 2 lanes) and 1080p60 (over 4 lanes) and at a lower link frequency than we run at. Toshiba produce a big spreadsheet to compute all the values from (and tell you when they're invalid), but it's under NDA so we can't release it, nor really code which replicates it.

There is a patch (aecd1053000e2421c4367bf7c2fafc5311b7c961) that increases it from 374 to 16 to cover 1080p modes on 2 lanes. You could try tweaking it and rebuilding the kernel.

mdevaev commented 4 years ago

Previous users only worried about 720p60 (over 2 lanes) and 1080p60 (over 4 lanes)

Well, they probably didn't try to use B101 to configure the BIOS and reinstall OS on a server. Of course, the problem is not critical, since the 800x600 works passably, and the 720x... sees rarely. But the perfectionist in me wants to find a solution.

Ideally the value should be dynamically adjusted

Okay, I think I understand. FIFO is in the chip itself, right? And since I don't have a spreadsheet, I'll have to pick a parameter at random?

6by9 commented 4 years ago

Yes, the FIFO is in the TC358743.

Small FIFO trigger values are good for roughly matched data rates, or CSI speed lower than HDMI rate (otherwise the FIFO overflows). Large FIFO trigger values are needed for faster CSI data rate than HDMI data rate, otherwise the FIFO is emptied too soon.

You can also load the DT overlay with dtoverlay=tc358743,link-frequency=297000000 to drop the link speed from 972Mbit/s/lane (486MHz) to 574Mbit/s/lane (297MHz link speed). It was at the lower speed that a FIFO trigger of 16 was used for 720p60 and 1080p60.

mdevaev commented 4 years ago

Sup. Do you need my help in testing? What release should the fix be included in?

6by9 commented 4 years ago

https://github.com/Hexxeh/rpi-firmware/commits/master https://github.com/Hexxeh/rpi-firmware/commit/7a4e85f6e682a59f984e5a7b605f9bb90e047585

kernel: media: bcm2835-unicam: Always service interrupts
See: raspberrypi/linux#3608

So using rpi-update at any point since 20th May would have got you the fix.

mdevaev commented 4 years ago

Everything seems to be working on Raspbian with latest kernel obtained using rpi-update. But I have another problem on the same kernel in Arch Linux ARM that I can't reproduce on Raspbian yet. When I try to stream using ustreamer I get a crash in dmesg:

[  148.617613] 8<--- cut here ---
[  148.620716] Unhandled fault: unknown 3 (0x2a03) at 0xffeee000
[  148.626514] pgd = 21038de6
[  148.629239] [ffeee000] *pgd=80000000007003, *pmd=2ffbe003, *pte=c00fff54e3e71f
[  148.636518] Internal error: : 2a03 [#1] PREEMPT SMP ARM
[  148.641779] Modules linked in: usb_f_mass_storage usb_f_hid usb_f_acm u_serial btsdio bluetooth ecdh_generic ecc brcmfmac brcmutil cfg80211 raspberrypi_hwmon hwmon i2c_mux_pinctrl i2c_mux bcm2835_unicam i2c_bcm2835 bcm2835_v4l2(C) bcm2835_codec(C) bcm2835_isp(C) v4l2_mem2mem iproc_rng200 rng_core bcm2835_mmal_vchiq(C) videobuf2_vmalloc videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common rpivid_mem uio_pdrv_genirq uio sch_fq_codel snd_bcm2835(C) snd_pcm snd_timer snd tc358743 v4l2_dv_timings v4l2_fwnode videodev mc cec libcomposite dwc2 udc_core drm drm_panel_orientation_quirks ip_tables x_tables ipv6 nf_defrag_ipv6
[  148.698262] CPU: 1 PID: 155 Comm: vchiq-slot/0 Tainted: G         C        5.4.42-1-ARCH #1
[  148.706633] Hardware name: BCM2711
[  148.710048] PC is at b15_dma_inv_range+0x20/0x50
[  148.718599] LR is at dma_cache_maint_page+0xa8/0x150
[  148.727498] pc : [<c0219d6c>]    lr : [<c0215bb0>]    psr: 000b0013
[  148.737726] sp : eeb67db8  ip : 00400fff  fp : c1404f98
[  148.746883] r10: 00000030  r9 : 00000002  r8 : c1555440
[  148.756068] r7 : c1407c84  r6 : fff54e3e  r5 : 00000000  r4 : 00000020
[  148.766656] r3 : 0000003f  r2 : 00000040  r1 : ffeef000  r0 : ffeee000
[  148.777100] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  148.788171] Control: 30c5383d  Table: 204a5580  DAC: 55555555
[  148.797915] Process vchiq-slot/0 (pid: 155, stack limit = 0x41d87e62)
[  148.808346] Stack: (0xeeb67db8 to 0xeeb68000)
[  148.816732] 7da0:                                                       eeb67dfc c0219e14
[  148.832977] 7dc0: ffeee000 ffffc4dc ef085c00 ed6a3ba0 00000fff ef138810 00000fe0 00000020
[  148.849495] 7de0: 00000000 e6580194 e6545000 c0216114 c0219e14 00000000 54e3e020 00000fff
[  148.866438] 7e00: ef138810 00000000 00000002 c029d728 00000fe0 00000002 eeb67e64 c0cedc28
[  148.883883] 7e20: 54e3e020 00000fff ef138810 c029d8f8 00000fe0 00000002 ef0da000 ef0d2d00
[  148.901904] 7e40: 9a48a783 e6545020 00000001 00000003 ef138810 c029d940 00000fe0 00000002
[  148.920474] 7e60: 00000000 c0ce842c 00000002 e6545068 00002540 e6545014 c15b1d38 c14a78f8
[  148.939531] 7e80: 00000003 c0b66f60 00000000 c0b5b8dc 00000004 c1591848 400b0013 c1591848
[  148.959039] 7ea0: 200b0013 c159178c c15917f4 e18634cc 00000008 e1863400 e6585b28 c14a78f8
[  148.978971] 7ec0: e1863570 e6580194 c15917f4 c0b60ff4 eeb67f1c c0cedc28 ef668880 c0255080
[  148.999580] 7ee0: ffffffff c1400000 00000000 00000002 00000010 00000009 0000000e 00003b38
[  149.020752] 7f00: c10d7804 e6580020 c15b1b34 c1591b30 e6580194 c1591814 c1591884 c0e8bf80
[  149.042467] 7f20: c10d8b58 c10d8b2c c1591848 c10d85cc c10d8718 e18634e0 c15918ec c10650a8
[  149.064604] 7f40: c10d86e4 c10d86c0 00000000 ef399e00 c0273544 eeb67f54 eeb67f54 c1404f88
[  149.087150] 7f60: ef0b1d04 eeafb440 eeb61700 00000000 eeb66000 c15917f4 c0b60094 eeafb45c
[  149.110327] 7f80: ef0b1d04 c024ec5c 00000000 eeb61700 c024eb34 00000000 00000000 00000000
[  149.134033] 7fa0: 00000000 00000000 00000000 c02010d8 00000000 00000000 00000000 00000000
[  149.157814] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  149.181586] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[  149.205350] [<c0219d6c>] (b15_dma_inv_range) from [<c0215bb0>] (dma_cache_maint_page+0xa8/0x150)
[  149.229387] [<c0215bb0>] (dma_cache_maint_page) from [<c0216114>] (__dma_page_dev_to_cpu+0x2c/0xc4)
[  149.253280] [<c0216114>] (__dma_page_dev_to_cpu) from [<c029d728>] (dma_direct_sync_single_for_cpu+0xbc/0xc0)
[  149.277603] [<c029d728>] (dma_direct_sync_single_for_cpu) from [<c029d8f8>] (dma_direct_unmap_page+0xb4/0xb8)
[  149.301254] [<c029d8f8>] (dma_direct_unmap_page) from [<c029d940>] (dma_direct_unmap_sg+0x44/0x60)
[  149.323356] [<c029d940>] (dma_direct_unmap_sg) from [<c0b66f60>] (vchiq_complete_bulk+0x1a4/0x260)
[  149.344992] [<c0b66f60>] (vchiq_complete_bulk) from [<c0b60ff4>] (slot_handler_func+0xf60/0x1560)
[  149.366167] [<c0b60ff4>] (slot_handler_func) from [<c024ec5c>] (kthread+0x128/0x154)
[  149.385806] [<c024ec5c>] (kthread) from [<c02010d8>] (ret_from_fork+0x14/0x3c)
[  149.404578] Exception stack(0xeeb67fb0 to 0xeeb67ff8)
[  149.415213] 7fa0:                                     00000000 00000000 00000000 00000000
[  149.434245] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  149.453142] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  149.465086] Code: e1a02312 e2423001 e1100003 e1c00003 (1e070f3e) 
[  149.476410] ---[ end trace ee601eaafd00e3bb ]---
[  149.486095] note: vchiq-slot/0[155] exited with preempt_count 1

Both OS has kernel 5.4.42, the latest firmare and 369ed4e44cb5a080a2dfa7f854ae4ff46b7c9ef9 version by vgencmd.

This problem doesn't seem to be directly related to tc358743. But before I figure out the exact conditions for reproducing the problem and create a separate issue, maybe this kernel trace tell you anything?

mdevaev commented 4 years ago

A bit of investigation: the Arch kernel is built from a commit: https://github.com/raspberrypi/linux/commit/5e5024f643caa53ff59a6e00f40a9b55f7fc4e17

rpi-update in Raspbian delivers this kernel (and everything works fine): https://github.com/raspberrypi/linux/commit/971a2bb14b459819db1bda8fcdf953e493242b42

In other words, the Arch kernel is slightly newer, and contains two commits that could have caused the problem. At least they are about DMA, and that's what my stacktrace is all about: https://github.com/raspberrypi/linux/commit/d8cbdaa729d5d3e9a1c18150bf4de69335a85a40 and https://github.com/raspberrypi/linux/commit/79495a5ecdfba69de51e88701a69c42d09806d84.

6by9 commented 4 years ago

Seeing as this is now totally unrelated to tc358743, you really should be opening a new issue.

mdevaev commented 4 years ago

Created #3647.

PS: Thanks for your work!

mdevaev commented 4 years ago

@6by9 I seem to have found another related problem with unicam. All the same TC357743 stopped working on ZeroW with 5.4 kernel. Just trying to start a stream and getting this:

[  197.603129] kernel BUG at drivers/media/platform/bcm2835/bcm2835-unicam.c:687!
[  197.610522] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[  197.616488] Modules linked in: usb_f_mass_storage usb_f_hid usb_f_acm u_serial 8021q garp stp mrp llc brcmfmac brcmutil cfg80211 raspberrypi_hwmon i2c_mux_pinctrl i2c_mux bcm2835_unicam bcm2835_codec(C) bcm2835_v4l2(C) bcm2835_isp(C) i2c_bcm2835 v4l2_mem2mem bcm2835_mmal_vchiq(C) videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common bcm2835_rng rng_core uio_pdrv_genirq uio fixed sch_fq_codel snd_bcm2835(C) snd_pcm snd_timer snd tc358743 v4l2_dv_timings v4l2_fwnode videodev mc cec libcomposite dwc2 udc_core drm drm_panel_orientation_quirks ip_tables x_tables ipv6 nf_defrag_ipv6
[  197.671736] CPU: 0 PID: 588 Comm: stream Tainted: G         C        5.4.45-1-ARCH #1
[  197.679733] Hardware name: BCM2835
[  197.683257] PC is at unicam_wr_dma_addr+0x60/0x64 [bcm2835_unicam]
[  197.689591] LR is at unicam_start_streaming+0x4c0/0x8ec [bcm2835_unicam]
[  197.696431] pc : [<7f272ab8>]    lr : [<7f273294>]    psr: 80000113
[  197.702831] sp : 91ec9d98  ip : 00000001  fp : 7f275378
[  197.708166] r10: 00000100  r9 : 00000122  r8 : 91ec9da8
[  197.713501] r7 : 921802f0  r6 : 00000000  r5 : 00000000  r4 : 92180040
[  197.720163] r3 : 00000000  r2 : 53696000  r1 : 53600000  r0 : 92180088
[  197.726828] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  197.737532] Control: 00c5387d  Table: 11ecc008  DAC: 00000055
[  197.746941] Process stream (pid: 588, stack limit = 0xb22e5c17)
[  197.756593] Stack: (0x91ec9d98 to 0x91eca000)
[  197.764555] 9d80:                                                       20000013 00000005
[  197.779776] 9da0: 00000400 53600000 00000000 81204f88 921803c0 92180408 92180508 7f168bfc
[  197.795076] 9dc0: 00000001 923c2900 92180548 7f208664 92180408 40045612 00000000 7f168bfc
[  197.810739] 9de0: 00000001 7f209adc 92180560 7f16d2f8 92310240 7586a000 fff9e02c 923cf880
[  197.826669] 9e00: 00000000 91ec9e6c 00000001 7f1809e8 92310300 00013800 00000000 00096000
[  197.842977] 9e20: 757d4000 81204f88 91e91e00 40045612 00000000 00000004 00000000 00000000
[  197.859596] 9e40: 91ec9e6c 7f16d0d4 00000001 7f16d8f8 00000001 00000000 76abdd30 923c2900
[  197.876585] 9e60: 00000000 00000000 812b62c0 00000001 00000002 00000000 00000001 00000000
[  197.893892] 9e80: 8134f800 923102a0 923105a0 7557a000 00000000 81183313 81183304 80296704
[  197.911518] 9ea0: 93170194 00000000 00000000 93170194 0000004d 80284164 0000004d 8011a04c
[  197.929558] 9ec0: 00000000 802afb20 00000000 802b5590 91eac134 00000000 00000000 923102a0
[  197.948026] 9ee0: 00000055 00000cc0 0007544d 81204f88 91ecdd50 76abdd30 92fbcb18 923c2900
[  197.966921] 9f00: 76abdd30 923c2900 91ec8000 00000036 00000008 8030a4f8 0007544d 91ec9fb0
[  197.986147] 9f20: 7544d000 00000817 91e91e00 915fb000 923102a0 915fb040 00000055 80c145a8
[  198.005680] 9f40: 00000000 00000000 0012d000 00000003 923c2900 00000008 00000001 00004000
[  198.025631] 9f60: 920f1500 91ec8000 00000036 81204f88 923c2901 00000008 40045612 76abdd30
[  198.046025] 9f80: 923c2900 91ec8000 00000036 8030a9b0 00000000 00000004 00000001 00000036
[  198.066402] 9fa0: 80101204 80101000 00000000 00000004 00000008 40045612 76abdd30 015723f8
[  198.086780] 9fc0: 00000000 00000004 00000001 00000036 40045612 00470038 76abdd30 00000008
[  198.107225] 9fe0: 0046fe08 76abdd0c 00429bc4 76cb1f0c 20000010 00000008 00000000 00000000
[  198.127836] [<7f272ab8>] (unicam_wr_dma_addr [bcm2835_unicam]) from [<7f273294>] (unicam_start_streaming+0x4c0/0x8ec [bcm2835_unicam])
[  198.152277] [<7f273294>] (unicam_start_streaming [bcm2835_unicam]) from [<7f208664>] (vb2_start_streaming+0x5c/0x16c [videobuf2_common])
[  198.176525] [<7f208664>] (vb2_start_streaming [videobuf2_common]) from [<7f209adc>] (vb2_core_streamon+0x7c/0x14c [videobuf2_common])
[  198.200487] [<7f209adc>] (vb2_core_streamon [videobuf2_common]) from [<7f16d2f8>] (__video_do_ioctl+0x224/0x44c [videodev])
[  198.223269] [<7f16d2f8>] (__video_do_ioctl [videodev]) from [<7f16d8f8>] (video_usercopy+0x28c/0x6c8 [videodev])
[  198.244702] [<7f16d8f8>] (video_usercopy [videodev]) from [<8030a4f8>] (do_vfs_ioctl+0x388/0x80c)
[  198.264259] [<8030a4f8>] (do_vfs_ioctl) from [<8030a9b0>] (ksys_ioctl+0x34/0x60)
[  198.281989] [<8030a9b0>] (ksys_ioctl) from [<80101000>] (ret_fast_syscall+0x0/0x4c)
[  198.299739] Exception stack(0x91ec9fa8 to 0x91ec9ff0)
[  198.309773] 9fa0:                   00000000 00000004 00000008 40045612 76abdd30 015723f8
[  198.327672] 9fc0: 00000000 00000004 00000001 00000036 40045612 00470038 76abdd30 00000008
[  198.345444] 9fe0: 0046fe08 76abdd0c 00429bc4 76cb1f0c
[  198.355297] Code: e12fff1e e1a0cf22 e35c0003 0affffeb (e7f001f2)
[  198.366097] ---[ end trace 9cbe57c15d3f6770 ]---

Everything worked fine on 4.19. /boot/cmdline.txt:

root=/dev/mmcblk0p2 ro cma=64M rootwait console=ttyAMA0,115200 console=tty1 selinux=0 plymouth.enable=0 smsc95xx.turbo_mode=N dwc_otg.lpm_enable=0 kgdboc=ttyAMA0,115200 elevator=noop audit=0

/boot/config.txt:

initramfs initramfs-linux.img followkernel
hdmi_force_hotplug=1
gpu_mem=64
start_x=1
enable_uart=1
dtoverlay=tc358743,i2c_pins_28_29=1
dtoverlay=disable-bt
dtoverlay=dwc2
naushir commented 4 years ago

This would have been triggered by BUG_ON((dmaaddr >> 30) != 0x3 && (endaddr >> 30) != 0x3);

@6by9, is my assumption about the dma address returned incorrect?

6by9 commented 4 years ago

No, you're quite right that we need the 0xC alias. This would imply something wrong in device-tree again.

@pelwell I'm assuming dma aliases are the same on Pi0/1 as on PI 2/3/4 and use the uncached alias, otherwise how do the other peripherals work?