raspberrypi / userland

Source code for ARM side libraries for interfacing to Raspberry Pi GPU.
BSD 3-Clause "New" or "Revised" License
2.03k stars 1.09k forks source link

Feature request: raspivid ~100ms latency #243

Closed MauroMombelli closed 8 years ago

MauroMombelli commented 9 years ago

hi, doing some experiment (either using HDMI and/or streaming) seems like the video from raspivid come with ~100ms of latency, it ramp up to 150ms but never lower than 80ms.

Is this some HW limitation? is there a way (spaecific hw, option, or code fix) to get lower latency?

thanks.

JamesH65 commented 9 years ago

There is quite a bit of processing going on. The image needs to be captured by the camera (speed depends on exposure), transferred over the CSI-2 link to the ISP in the GPU. It then goes through about 25 stages of processing to convert from the bayer sensor data to something that looks like a picture. The ISP is good for about 180MPixels/second IIRC from start to finish. Then there is work required managing the image data, for example, passing it through the ISP stages, and then off to the HDMI. So there are some fairly good reasons why the latency is there, but whether it could be faster, perhaps, but the code has been fairly well optimised already so finding where would be a long and tedious job.

MauroMombelli commented 9 years ago

i understand that there is a lot going on, but from my test using FullHD and HD (30 or 60 fps) has about the same latency! so there seems to be something like a buffer or similar that is slowing everything down. The problem is i have no experience in video driver/processing and AFAIK part of that implementation in source closed; so i need an help to understand what i can study to start to debug that thing, starting from excluding something prom the proprietary blob.

Any help is appreciated, thanks.

JamesH65 commented 9 years ago

I seriously doubt you will be able to do anything to improve the latency from userspace. I suspect there isn't actually much room for improvement anyway. 100ms is pretty good. I presume this is just HDMI (preview) output? No H264 encode or saving to SD card?

MauroMombelli commented 9 years ago

i think so, for plain HDMI i used

/opt/vc/bin/raspivid  -t 0 -w 1280 -h 720 -fps 60

now i'm goint to try with

/opt/vc/bin/raspivid  -t 0 -w 1280 -h 720 -fps 60 -o /dev/null

if you have any suggestion it can improve.. i also tried using v2l, but its strange as the output of v2l-ctl is not readable and i don't get why, the command i'm using are

#!/bin/bash
modprobe bcm2835-v4l2
v4l2-ctl --set-fmt-video=width=1280,height=720,pixelformat=4
mkfifo test1.fifo
while true; do
    v4l2-ctl --stream-mmap=3 --stream-to=test1.fifo &
    stdbuf -i0 -o0 -e0 cat test1.fifo | nc -ukl 2222 -v
done

here the test with

/opt/vc/bin/raspivid  -t 0 -w 1280 -h 720 -fps 60

720ponlyhdmi

MauroMombelli commented 9 years ago

testhd

more test:

/opt/vc/bin/raspivid  -t 0 -w 1280 -h 720 -fps 60 -o /dev/null

use up to 10% cpu (with one raspvid thread eating costantly 3.8% cpu, on raspi b+) and i have to use a power supply > 1Ampere output or it reboot after 10/15sec (!)

with

/opt/vc/bin/raspivid  -t 0 -w 1280 -h 720 -fps 60

cpu usage of 3%, and no raspivid instance hog near 1% and run fine on the 1Ampere usb charger!

also here an interesting photo: the timer on the TV (HDMI) has ~170ms lag, while the image streamed to my pc has only ~90ms delay! i was expecting some delay, but that is a great difference

edit: fun fact: on Full HD (30fps) the screen and the PC monitor has the same exact latency, while in HD@30fps the laptop monitor is still faster even if loose some ms (130ms vs 170ms)

have to test the composite video output... do raspivid output to composite? probably i will set up a pipe to it, but i have no idea how to right now, and i'm missing the cables. (please note every time i use pipe i take care to set them to 0 byte buffer)

JamesH65 commented 9 years ago

It's probably worth continuing this conversation on the Pi forums - there will be more people who may be able to comment.

Odd though that dumping to NULL seems to take more power than doing something with the data - that sounds odd.

You will need a decent power supply, the camera uses at least 200-250mA.

On 29 May 2015 at 14:01, Mauro Mombelli notifications@github.com wrote:

more test:

/opt/vc/bin/raspivid -t 0 -w 1280 -h 720 -fps 60 -o /dev/null

use up to 10% cpu (with one raspvid thread eating costantly 3.8% cpu, on raspi b+) and i have to use a power supply > 1Ampere output or it reboot after 10/15sec (!)

with

/opt/vc/bin/raspivid -t 0 -w 1280 -h 720 -fps 60

cpu usage of 3%, and no raspivid instance hog near 1% and run fine on the 1Ampere usb charger!

also here an interesting photo: the timer on the TV (HDMI) has ~170ms lag, while the image streamed to my pc has only ~90ms delay! i was expecting some delay, but that is a great difference

— Reply to this email directly or view it on GitHub https://github.com/raspberrypi/userland/issues/243#issuecomment-106796983 .

popcornmix commented 9 years ago

If you don't specify an output filename it doesn't encode the video, so less cpu and less power consumption. https://github.com/raspberrypi/userland/blob/master/host_applications/linux/apps/raspicam/RaspiVid.c#L2109

MauroMombelli commented 9 years ago

but when i output to -o - (or to a mkfifo file) raspi process still keep itself low (on the other way, netcat used to stream in use a good 30%) (with -o - it has file hadle: https://github.com/raspberrypi/userland/blob/master/host_applications/linux/apps/raspicam/RaspiVid.c#L1989)

Please note that stream to netcat or without -o wasn't crashing the raspi, only using /dev/null with the low power supply

JamesH65 commented 9 years ago

When using -o, the software will be opening up a file, and sending data to it. Even if NULL this will take more CPU and power that not doing anything with the data (except display). Piping to netcat will use even more power and CPU because now the system is using the ethernet (and hence lots of USB as well).

On 29 May 2015 at 14:54, Mauro Mombelli notifications@github.com wrote:

but when i output to -o - (or to a mkfifo file) raspi process still keep itself low (on the other way, netcat used to stream in use a good 30%)

— Reply to this email directly or view it on GitHub https://github.com/raspberrypi/userland/issues/243#issuecomment-106809564 .

MauroMombelli commented 9 years ago

here some test (please even if the data is taken "punctually" the value was pretty stable over 10-15 seconds

no output ouption (aka only HDMI): 0% cpu (total: 2.4%) to null (+hdmi): 3.8% 3.3% 0.9% (and two systemd-timesyncd at 2.8% and 2.4%.. uhmm) (total: 10.5%) to normal file(+hdmi): 2 process at 3%, 1 at 0.5% (total: 6.2% !!) to netcat(+hdmi): 10.3% netcat, 2.3%, 1.4% and 0.9% (total: 15.5%)

to null (no hdmi): 0%! (total: 4.6%) to normal file(no hdmi): 2 process at 3%, 1 at 0.9% (total: 7.4% !!) to netcat(no hdmi): 9.4% netcat, 1.9%, 1.4% and 0.5% (total: 12.9%)

6by9 commented 9 years ago

Assuming 1280x720 30fps: Frame exposed on sensor and line received by GPU - 33ms. Rolling shutter delay from first line of the image to the last - ~20ms ISP processing (assuming stabilisation off - partially done as the data is received) - approx 10ms. Video encode - approx 20ms. Network transmission - ??!? Video decode - seriously variable. Display on HDMI - depends on the monitor. So best case for the encode itself is around 33+20+10+20 = 83ms.

1280P60 is not supported off the sensor. Your selection of 1280x720 @ 60fps will actually read VGA off the sensor and upscale it. Times should drop to around 16+16+10+20 = 62ms

Your variation is probably down to the extra time required to transmit I-frames, which by necessity are larger than P-frames.

Displaying the preview on the HDMI still has the first 3 delays I've quoted above. The display engine has to wait for the next vertical sync, which typically running at 60fps, so best case 0ms, worst case 16ms. You then have the delay through the HDMI display.

The processing on the GPU is pretty much at the lowest level that can be achieved. Network transmission (particularly on wifi) and HDMI decode time are out of our control but can be tweaked based on use case.

As to CPU loading, preview direct to HDMI is done solely on the GPU - it doesn't involve the ARM. The actual encode likewise is done solely on the GPU. So all you're actually measuring is the difference in CPU load based on the requested data destination, and potentially the difference that the extra SDRAM bandwidth the needs to be able to do the encode which is therefore not available to the ARM.

MauroMombelli commented 9 years ago

thanks, camera and FPS seems to indrocute the biggest lag, with seems ok

1280P60 is not supported off the sensor. Your selection of 1280x720 @ 60fps will actually read VGA off the sensor and upscale it.

sorry but AFAIK 1280x720 is NOT 1280P but 720P, and by the specification the board support it (from RS)

Supports 1080p, 720p60 and VGA90

the CPU timing are here just to show something really strange is going on when outputting to /dev/null, it use more CPU than saving to file or outputting to the network

6by9 commented 9 years ago

Typo by me, I meant 720P.

The sensor modes that are used by Pi are

More discussion from when the extra modes were released on the forum, eg https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=62364&start=250#p520078, https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=72116 and https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=85714&p=605259

MauroMombelli commented 9 years ago

they never provided us with register settings for it, or we ignored it as it had a hugely cropped field of view

very interesting (i was expecting all modes was supported), is there a way to try in in case have been removed by you? also this explain the bad quality of the video. Also the upscaling can take away a lot of time, right? thime to set up again the test rig :+1:

JamesH65 commented 9 years ago

Upscaling won't take much time - it's all done in the HW.

Not really any way of trying out modes that are not hardcoded in to the driver.

On 2 June 2015 at 15:00, Mauro Mombelli notifications@github.com wrote:

they never provided us with register settings for it, or we ignored it as it had a hugely cropped field of view

very interesting, is there a way to try in in case have been removed by you? also this expplain the bad quality of the video. Also the upscaling can take away a lot of time, right? thime to set up again the test rig [image: :+1:]

— Reply to this email directly or view it on GitHub https://github.com/raspberrypi/userland/issues/243#issuecomment-107965350 .

MauroMombelli commented 9 years ago

yes, after quick test there seems to be a smoother video using VGA@90 fps (delay is still 80-100ms but maybe we have less variance, hard to tell without an automated test, maybe with some OCR..)

6by9 commented 9 years ago

Upscaling won't take much time - it's all done in the HW

Almost true. The ISP effectively runs at one pixel per clock, whether that be input or output. So it will be equivalent in time to 1280x720 in as well as out. That's also why capturing a still at VGA output resolution takes approximately the same amount of time as a 5MP capture - they are both reading in the full 5MP from the sensor, so that is the bottleneck. It just won't be outputting a pixel on every clock cycle.

I've just looked at the camera driver source - there is a disabled 720P mode listed, but only 30fps. We have better than that with the binned modes (1296x730 @ 49fps), so no point investigating.

popcornmix commented 8 years ago

Seems like the latency is understood, so closing.