Closed MauroMombelli closed 8 years ago
There is quite a bit of processing going on. The image needs to be captured by the camera (speed depends on exposure), transferred over the CSI-2 link to the ISP in the GPU. It then goes through about 25 stages of processing to convert from the bayer sensor data to something that looks like a picture. The ISP is good for about 180MPixels/second IIRC from start to finish. Then there is work required managing the image data, for example, passing it through the ISP stages, and then off to the HDMI. So there are some fairly good reasons why the latency is there, but whether it could be faster, perhaps, but the code has been fairly well optimised already so finding where would be a long and tedious job.
i understand that there is a lot going on, but from my test using FullHD and HD (30 or 60 fps) has about the same latency! so there seems to be something like a buffer or similar that is slowing everything down. The problem is i have no experience in video driver/processing and AFAIK part of that implementation in source closed; so i need an help to understand what i can study to start to debug that thing, starting from excluding something prom the proprietary blob.
Any help is appreciated, thanks.
I seriously doubt you will be able to do anything to improve the latency from userspace. I suspect there isn't actually much room for improvement anyway. 100ms is pretty good. I presume this is just HDMI (preview) output? No H264 encode or saving to SD card?
i think so, for plain HDMI i used
/opt/vc/bin/raspivid -t 0 -w 1280 -h 720 -fps 60
now i'm goint to try with
/opt/vc/bin/raspivid -t 0 -w 1280 -h 720 -fps 60 -o /dev/null
if you have any suggestion it can improve.. i also tried using v2l, but its strange as the output of v2l-ctl is not readable and i don't get why, the command i'm using are
#!/bin/bash
modprobe bcm2835-v4l2
v4l2-ctl --set-fmt-video=width=1280,height=720,pixelformat=4
mkfifo test1.fifo
while true; do
v4l2-ctl --stream-mmap=3 --stream-to=test1.fifo &
stdbuf -i0 -o0 -e0 cat test1.fifo | nc -ukl 2222 -v
done
here the test with
/opt/vc/bin/raspivid -t 0 -w 1280 -h 720 -fps 60
more test:
/opt/vc/bin/raspivid -t 0 -w 1280 -h 720 -fps 60 -o /dev/null
use up to 10% cpu (with one raspvid thread eating costantly 3.8% cpu, on raspi b+) and i have to use a power supply > 1Ampere output or it reboot after 10/15sec (!)
with
/opt/vc/bin/raspivid -t 0 -w 1280 -h 720 -fps 60
cpu usage of 3%, and no raspivid instance hog near 1% and run fine on the 1Ampere usb charger!
also here an interesting photo: the timer on the TV (HDMI) has ~170ms lag, while the image streamed to my pc has only ~90ms delay! i was expecting some delay, but that is a great difference
edit: fun fact: on Full HD (30fps) the screen and the PC monitor has the same exact latency, while in HD@30fps the laptop monitor is still faster even if loose some ms (130ms vs 170ms)
have to test the composite video output... do raspivid output to composite? probably i will set up a pipe to it, but i have no idea how to right now, and i'm missing the cables. (please note every time i use pipe i take care to set them to 0 byte buffer)
It's probably worth continuing this conversation on the Pi forums - there will be more people who may be able to comment.
Odd though that dumping to NULL seems to take more power than doing something with the data - that sounds odd.
You will need a decent power supply, the camera uses at least 200-250mA.
On 29 May 2015 at 14:01, Mauro Mombelli notifications@github.com wrote:
more test:
/opt/vc/bin/raspivid -t 0 -w 1280 -h 720 -fps 60 -o /dev/null
use up to 10% cpu (with one raspvid thread eating costantly 3.8% cpu, on raspi b+) and i have to use a power supply > 1Ampere output or it reboot after 10/15sec (!)
with
/opt/vc/bin/raspivid -t 0 -w 1280 -h 720 -fps 60
cpu usage of 3%, and no raspivid instance hog near 1% and run fine on the 1Ampere usb charger!
also here an interesting photo: the timer on the TV (HDMI) has ~170ms lag, while the image streamed to my pc has only ~90ms delay! i was expecting some delay, but that is a great difference
— Reply to this email directly or view it on GitHub https://github.com/raspberrypi/userland/issues/243#issuecomment-106796983 .
If you don't specify an output filename it doesn't encode the video, so less cpu and less power consumption. https://github.com/raspberrypi/userland/blob/master/host_applications/linux/apps/raspicam/RaspiVid.c#L2109
but when i output to -o - (or to a mkfifo file) raspi process still keep itself low (on the other way, netcat used to stream in use a good 30%) (with -o - it has file hadle: https://github.com/raspberrypi/userland/blob/master/host_applications/linux/apps/raspicam/RaspiVid.c#L1989)
Please note that stream to netcat or without -o wasn't crashing the raspi, only using /dev/null with the low power supply
When using -o, the software will be opening up a file, and sending data to it. Even if NULL this will take more CPU and power that not doing anything with the data (except display). Piping to netcat will use even more power and CPU because now the system is using the ethernet (and hence lots of USB as well).
On 29 May 2015 at 14:54, Mauro Mombelli notifications@github.com wrote:
but when i output to -o - (or to a mkfifo file) raspi process still keep itself low (on the other way, netcat used to stream in use a good 30%)
— Reply to this email directly or view it on GitHub https://github.com/raspberrypi/userland/issues/243#issuecomment-106809564 .
here some test (please even if the data is taken "punctually" the value was pretty stable over 10-15 seconds
no output ouption (aka only HDMI): 0% cpu (total: 2.4%) to null (+hdmi): 3.8% 3.3% 0.9% (and two systemd-timesyncd at 2.8% and 2.4%.. uhmm) (total: 10.5%) to normal file(+hdmi): 2 process at 3%, 1 at 0.5% (total: 6.2% !!) to netcat(+hdmi): 10.3% netcat, 2.3%, 1.4% and 0.9% (total: 15.5%)
to null (no hdmi): 0%! (total: 4.6%) to normal file(no hdmi): 2 process at 3%, 1 at 0.9% (total: 7.4% !!) to netcat(no hdmi): 9.4% netcat, 1.9%, 1.4% and 0.5% (total: 12.9%)
Assuming 1280x720 30fps: Frame exposed on sensor and line received by GPU - 33ms. Rolling shutter delay from first line of the image to the last - ~20ms ISP processing (assuming stabilisation off - partially done as the data is received) - approx 10ms. Video encode - approx 20ms. Network transmission - ??!? Video decode - seriously variable. Display on HDMI - depends on the monitor. So best case for the encode itself is around 33+20+10+20 = 83ms.
1280P60 is not supported off the sensor. Your selection of 1280x720 @ 60fps will actually read VGA off the sensor and upscale it. Times should drop to around 16+16+10+20 = 62ms
Your variation is probably down to the extra time required to transmit I-frames, which by necessity are larger than P-frames.
Displaying the preview on the HDMI still has the first 3 delays I've quoted above. The display engine has to wait for the next vertical sync, which typically running at 60fps, so best case 0ms, worst case 16ms. You then have the delay through the HDMI display.
The processing on the GPU is pretty much at the lowest level that can be achieved. Network transmission (particularly on wifi) and HDMI decode time are out of our control but can be tweaked based on use case.
As to CPU loading, preview direct to HDMI is done solely on the GPU - it doesn't involve the ARM. The actual encode likewise is done solely on the GPU. So all you're actually measuring is the difference in CPU load based on the requested data destination, and potentially the difference that the extra SDRAM bandwidth the needs to be able to do the encode which is therefore not available to the ARM.
thanks, camera and FPS seems to indrocute the biggest lag, with seems ok
1280P60 is not supported off the sensor. Your selection of 1280x720 @ 60fps will actually read VGA off the sensor and upscale it.
sorry but AFAIK 1280x720 is NOT 1280P but 720P, and by the specification the board support it (from RS)
Supports 1080p, 720p60 and VGA90
the CPU timing are here just to show something really strange is going on when outputting to /dev/null, it use more CPU than saving to file or outputting to the network
Typo by me, I meant 720P.
The sensor modes that are used by Pi are
More discussion from when the extra modes were released on the forum, eg https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=62364&start=250#p520078, https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=72116 and https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=85714&p=605259
they never provided us with register settings for it, or we ignored it as it had a hugely cropped field of view
very interesting (i was expecting all modes was supported), is there a way to try in in case have been removed by you? also this explain the bad quality of the video. Also the upscaling can take away a lot of time, right? thime to set up again the test rig :+1:
Upscaling won't take much time - it's all done in the HW.
Not really any way of trying out modes that are not hardcoded in to the driver.
On 2 June 2015 at 15:00, Mauro Mombelli notifications@github.com wrote:
they never provided us with register settings for it, or we ignored it as it had a hugely cropped field of view
very interesting, is there a way to try in in case have been removed by you? also this expplain the bad quality of the video. Also the upscaling can take away a lot of time, right? thime to set up again the test rig [image: :+1:]
— Reply to this email directly or view it on GitHub https://github.com/raspberrypi/userland/issues/243#issuecomment-107965350 .
yes, after quick test there seems to be a smoother video using VGA@90 fps (delay is still 80-100ms but maybe we have less variance, hard to tell without an automated test, maybe with some OCR..)
Upscaling won't take much time - it's all done in the HW
Almost true. The ISP effectively runs at one pixel per clock, whether that be input or output. So it will be equivalent in time to 1280x720 in as well as out. That's also why capturing a still at VGA output resolution takes approximately the same amount of time as a 5MP capture - they are both reading in the full 5MP from the sensor, so that is the bottleneck. It just won't be outputting a pixel on every clock cycle.
I've just looked at the camera driver source - there is a disabled 720P mode listed, but only 30fps. We have better than that with the binned modes (1296x730 @ 49fps), so no point investigating.
Seems like the latency is understood, so closing.
hi, doing some experiment (either using HDMI and/or streaming) seems like the video from raspivid come with ~100ms of latency, it ramp up to 150ms but never lower than 80ms.
Is this some HW limitation? is there a way (spaecific hw, option, or code fix) to get lower latency?
thanks.