raspberrypi / userland

Source code for ARM side libraries for interfacing to Raspberry Pi GPU.
BSD 3-Clause "New" or "Revised" License
2.05k stars 1.09k forks source link

Streaming over the network can cause the Wifi to crash on PiZeroW #421

Closed sej7278 closed 7 years ago

sej7278 commented 7 years ago

See also forum post

Running the following.....

raspivid -t 0 -w 1920 -h 1080 --hflip --vflip -fps 30 -b 2000000 -o - | \
gst-launch-1.0 -v fdsrc ! h264parse ! rtph264pay config-interval=1 pt=96 ! \
gdppay ! tcpserversink host=desktop.lan port=5000

....causes my Pi ZeroW to lock up.

Once it didn't totally kill the machine and I managed to get some dmesg output, although the raspivid output had stopped it worked again on restarting the process.

Once it logged a systemd error "Internal error: Oops: 817 [1] ARM", but usually it just totally crashes the machine, a P6 reset doesn't work, only removing and re-applying usb power will reboot it.

After some experimentation it would seem that it doesn't crash if I add the -n flag to disable preview to HDMI (i don't have a monitor plugged in).

It also doesn't crash if you remove the --hflip and --vflip flags, so it would seem rotating and outputting to HDMI and wifi is too much for a ZeroW

Its not a power issue as I've used two supplies and it never draws more than 0.6A or dips below 5.0v

This is a fully updated Stretch with the 4.9.41 kernel, old 5mp camera module

6by9 commented 7 years ago

H and V flips are done on the sensor itself by reading out the sensor array in a modified order. No extra GPU load at all. Rotate (specifically transpose) is done on the GPU (and is expensive in terms of performance and SDRAM bandwidth), but I see no evidence that you're doing a transpose.

As DirkS suggested on the forum thread, break it down. Does raspivid -t 0 -w 1920 -h 1080 --hflip --vflip -fps 30 -b 2000000 -o /dev/null crash? Seeing as you've managed to crash the kernel, I'd suspect something further downstream in your GStreamer pipe (probably Wifi or other network taking the output of GST)

BTW 2Mbit/s is a pretty low bitrate for 1080P30. For reasonable quality I'd be expecting >8Mbit/s.

sej7278 commented 7 years ago

ah no, i can't get it to crash when its going to /dev/null instead of gstreamer. going to try plugging a monitor in too

6by9 commented 7 years ago

Next step then is to start adding in the GStreamer stages and terminating the pipe in a fakesink. That should happily swallow anything.

sej7278 commented 7 years ago

i'll give it a go, not an expert on gstreamer, not sure if it needs a client connection to crash. i wonder if vlc would be better than gstreamer?

i plugged in a monitor and keyboard and when i changed the bitrate to 8mbps it eventually dropped the stream and i couldn't ssh in, but the screen was still updating so i guess it just overwhelmed the wifi, when i did a P6 reset there was some stack trace on there talking about iptables being linked in, so i guess it was the wifi driver that crashed.

i managed to make it crash using my original command with the gstreamer and the screen even stopped updating so it definitely crashes totally, its not just wifi.

6by9 commented 7 years ago

Each of the !'s is denotes a new stage in the processing pipe. You should be able to put fakesink after any of them and have a working pipeline.

Start with raspivid -t 0 -w 1920 -h 1080 --hflip --vflip -fps 30 -b 2000000 -o - | \ gst-launch-1.0 -v fdsrc ! h264parse ! fakesink If that works then increase it to raspivid -t 0 -w 1920 -h 1080 --hflip --vflip -fps 30 -b 2000000 -o - | \ gst-launch-1.0 -v fdsrc ! h264parse ! rtph264pay config-interval=1 pt=96 ! fakesink and continue up to your full pipeline.

sej7278 commented 7 years ago

ok this doesn't crash:

raspivid -t 0 -w 1920 -h 1080 --hflip --vflip -fps 30 -b 2000000 -o - | \
    gst-launch-1.0 -v fdsrc ! h264parse ! fakesink

nor does:

raspivid -t 0 -w 1920 -h 1080 --hflip --vflip -fps 30 -b 2000000 -o - | \ 
    gst-launch-1.0 -v fdsrc ! h264parse ! rtph264pay config-interval=1 pt=96 ! fakesink

nor the gdppay sink.

but i wonder if it needs to be combined with wifi to crash, anyway i'll try some more pipes

sej7278 commented 7 years ago

the only way i can crash it is with tcpserversink and a connected client. it doesn't seem to crash just running the server, so i assume its the wifi load or something....?

i noticed the client crashed, ssh connection froze and the screen got laggy. P6-reset doesn't work but does exit raspivid back to the console and shows "Internal error: Oops: 817 [1] ARM" and the kernel null pointer dereference from the above dmesg gist.

also -n (disable preview) doesn't help, i managed to crash with fixing recursive fault but reboot is needed with a few sdio/mmc errors, i wonder if wifi is using so much of the bus that its screwing with the sdcard too?

so in summary to reproduce, use the original command and have a client connect over wifi using e.g.

gst-launch-1.0 -v tcpclientsrc host=pi port=5000 ! gdpdepay ! rtph264depay ! avdec_h264 ! videoconvert ! autovideosink sync=false
6by9 commented 7 years ago

So I'd suggest you test with an alternate USB Wifi or ethernet dongle to narrow down what is causing the failure. There were a couple of fixes to the Wifi firmware and driver recently, but I thought those were in before 4.9.41+ (around 8th August).

sej7278 commented 7 years ago

not sure i'm up to debugging wifi drivers!

similar issue perhaps - zerow wifi dies when streaming camera: https://github.com/raspberrypi/linux/issues/1342

6by9 commented 7 years ago

I'm not asking you to debug the driver, but pinpoint where the issue is.

As I had said, there were changes pushed recently for the onboard Wifi firmware and driver (it's common between PiZero and 3). Some are still reporting issues, but at the moment you haven't pinned down that that is the problem in your specific case.

sej7278 commented 7 years ago

ok, i managed to make it crash using a usb ethernet dongle: gist

seems to be another "pi zero locks up" thread on the forum with the same arm oops when running vnc

sej7278 commented 7 years ago

oddly enough i cannot get an old zero 1.3 to crash, even when streaming at 8mbps over usb wifi dongle.

i've swapped their sdcards, power supplies, power cables, and i can get the zerow to crash under all circumstances.

i've either got a buggy zerow, or the wifi chip is causing heat issues or something. could be the camera modules i guess (although the crashing zerow one is an official uk model, the zero one is a chinese noir clone)

sej7278 commented 7 years ago

Not sure the title is even right now as i could crash over ethernet.

I'd like to know if someone else could reproduce.

6by9 commented 7 years ago

Feel free to change the title yourself then. It was totally incorrect as there is no issue in raspivid.

The Github repos are frequented by very few people, and your issue is looking to be kernel related rather than userspace - that would be https://github.com/raspberrypi/linux/ rather than here. I'd update your forum thread and ask there, or start a follow-on thread in an appropriate forum ("Networking and servers" probably).

If you can test on a second Pi0W then it'd be worth it. This issue sounds like a hardware problem, so I wonder if your Pi is struggling to achieve the normal 1GHz that the Pi0 runs at. You could try underclocking it too to see if that helps.

sej7278 commented 7 years ago

ok, thanks for the advice. i'll try underclocking.

i'm not prepared to buy another zerow as i've spent too much on this failed project already (zerow has been pretty flaky - like the gpio-shutdown-overlay rarely works). i was hoping a foundation staffer could try to reproduce.

i might just use a monitor and forget streaming, in which case i could have just used a zero1.3 or something off-the-shelf.

moved back to forum

6by9 commented 7 years ago

i was hoping a foundation staffer could try to reproduce.

The company would go bust in 5 minutes flat if it tried to replicate every single failure reported, particularly as they are often poorly described, or likely to be specific to the reporter's hardware. How much time do you think the profit margin on a Pi0W would cover salary and associated costs for? We're happy to help where applicable, and will react where there is a specific problem, but forum and Github support is normally being done whilst waiting on other things - we're all working on other stuff too.

sej7278 commented 7 years ago

sorry i didn't mean for you to test it necessarily, just thought there may be a process for raising issues with the foundation (how else would anything get fixed?)

anyway, i've underclocked to 800mhz and it seems stable, still up to 20c hotter than my zero1.3 but no cooler than at 1ghz, so i guess its not a temperature problem.

no idea what it could be really, just something in the sbc that doesn't like 1ghz, which is disappointing as that's the speed they're marketed at. i do recall they were initially shipped at 700mhz like the earlier pi's, is 1ghz actually an overclock?

JamesH65 commented 7 years ago

They are sold at 1Ghz. If your's isn't working correctly at that speed (its very rare but possible)I suggest either getting a refund (is it worth it for $5? You might find a use for it elsewhere) or simply getting a new one.

sej7278 commented 7 years ago

yes i can't be faffing around for a refund (although its actually $16 with postage and tax, not $5) but really just wanted assurance that if i buy another it won't have the same problem. so they're not overclocked 700mhz chips or anything? might just be a bad one?

6by9 commented 7 years ago

There's no "binning" of devices to be sold as running at differing clock speeds in the way Intel and similar will do. It's the same chip put into all Zero, ZeroW, B+, and A+ boards.

The original designed spec was 700MHz, but production and yield improvements has actually shown that almost all will run at 1GHz or more (some people overclocked early Pi's up to 1.4GHz IIRC). There will be the odd one that comes off the production line that will run happily at the original 700MHz spec, but not stable at higher clock rates. They are the exception, and unfortunately there isn't an easy way to weed them out in testing. It seems you've got one of the weaker ones and we can only apologise for that.

sej7278 commented 7 years ago

ok no problem, i'll get another and see how it goes. i'll close the issue as its filed under the wrong repo anyway.

thebeachtoday commented 6 years ago

UPDATE: Found the issue, it was Darkice. Killing darkice [i.e. sudo pkill -f darkice] completely dropped all the client CPU issues. Another thing to try in troubleshooting that helps: use