icub-tech-iit / study-icub-head

Main collector for the design of the iCub head
BSD 3-Clause "New" or "Revised" License
3 stars 2 forks source link

4K Cameras YARP Driver – Kick off activities #19

Closed pattacini closed 2 years ago

pattacini commented 2 years ago

We're required to kick off the activities related to this Epic:

triccyx commented 2 years ago

Today together with @Nicogene and @mfussi66 we have had a preliminary discussion about the camera. The general idea is to write the yarp device on the jetson nano or xavier nx. Speaking with @sgiraz I find out that we need to understand the compression output from the camera (H.264?). The Yarp device will not need to compress the video stream but only forward on a Tcp/Udp yarp carrier so we don't need the CUDA API on GPU. On the other side, a port monitor with H624 will decode the stream. Problems:

sgiraz commented 2 years ago

Hi @triccyx,

mfussi66 commented 2 years ago

I find out that we need to understand the compression output from the camera (H.264?).

This is an important point, the feature set of our cameras is described here, but there is no mention of compression.

Therefore I'm skeptical about the fact that we don't need to do any H264 encoding.

Nonetheless, the Basler code provided is under the name of Pylon SDK, with a Github page with interesting stuff (namely a ROS package for the cameras).

The source code of the C++ SDK is located in the Jetson Nano in /opt/pylon. It might be a starting point to play around. Here is the API doc: https://docs.baslerweb.com/pylonapi/

Nicogene commented 2 years ago

We can take inspiration of https://github.com/basler/pylon-ros-camera, something that I would understand is if that kind of device streams an encoded stream or not, i.e. the only way to use the images published is decoding from h264.

For streaming 4k images we are forced to write on ports encoded images, it would be the first device (afaik) that publishes not-ready-to-use images, usually the responsibility of the format is usually on the carrier (e.g. raw, mjpeg etc). For not-ready-to-use-images I mean that who uses those images has to know to decode them first

traversaro commented 2 years ago

We can take inspiration of https://github.com/basler/pylon-ros-camera, something that I would understand is if that kind of device streams an encoded stream or not, i.e. the only way to use the images published is decoding from h264.

If you check, the node store the image in a sensor_msgs::Image instance (https://github.com/basler/pylon-ros-camera/blob/caff7fe26095dfb0bb803873b2fbaf32f1daf64b/pylon_camera/include/pylon_camera/pylon_camera_node.h#L1237), and that class stores uncompressed images (see http://docs.ros.org/en/noetic/api/sensor_msgs/html/msg/Image.html).

pattacini commented 2 years ago

Aiming at delivering 2x 4K streams at 30 fps might be difficult given the tight timeline we have. As discussed, the idea is to deliver "something less" – to ease the development in this stage where we need to learn a lot about GPU – and then improve incrementally.

A couple of notes:

cc @triccyx @Nicogene @sgiraz @mfussi66 @S-Dafarra @maggia80

pattacini commented 2 years ago

Dev kicked off in https://github.com/robotology/yarp-device-basler.

triccyx commented 2 years ago

For today we have

triccyx commented 2 years ago

Thanks to @Nicogene @mfussi66 we have done some crucial steps:

triccyx commented 2 years ago

NOTE: Using a resolution of 1024×768×3×30×2*8=1132462080 b/sec not compressed Note that 1132462080 doesn't include the UDP header. This also depends on the MTU size, not sure it can be increased. If our board is 1Gbit is not enough. Without compression, we can estimate about 22-25 fps with UDP less with TCP.

fps=30 Hz resolution=1024×768 pixel color depth=3x8 bit cam number=2

mfussi66 commented 2 years ago

An improvement that could come for free (just need to enable the setting: https://docs.baslerweb.com/embedded-vision/pixel-format#__tabbed_1_1) is encoding the RGB frames into YUV4:2:2, that should improve the bandwidth since instead of encoding a pixel with 24bits on average, you need 16bits.

pattacini commented 2 years ago

Let's state that:

$$ 1024 \cdot 768 \cdot 2 \cdot r \cdot b = 800 \text{Mbps} $$

where $r$ is the fps in Hz and $b$ is the color depth in bits.

I've used here 800 Mbps instead of 1 Gbps to take some safety margin and to account for the UDP overhead (we will certainly use UDP and not TCP).

It comes out that:

$$ r \cdot b \approx 510 \text{Mbps} $$

If we set $b = 16 \text{bit}$, then $r$ can be $30 \text{fps}$. If we set $b = 24 \text{bit}$, then $r$ can be $20 \text{fps}$.

This is a good starting point for our first implementation indeed 👍🏻

The layout is such that the Xavier NX is connected through an internal 1Gpbs Ethernet to the COM-EXP Type 10, which in turn could be doing the compression for steaming out the data over WiFi.

triccyx commented 2 years ago

Sadly Yuv in Yarp is not completely supported

S-Dafarra commented 2 years ago

Sadly Yuv in Yarp is not completely supported

The cameras currently installed on iCub3 stream in Yuv if I am not mistaken https://github.com/icub-tech-iit/tickets/issues/132#issuecomment-801883299

Nicogene commented 2 years ago

In YARP YUV is defined only in terms of pixel code see:

https://github.com/robotology/yarp/blob/14df562aa253fb8d041ba98eaa7fb96d3281017a/src/libYARP_sig/src/yarp/sig/Image.h#L64-L67

But there is no implementation of the PixelYUV and analogously the NWS are publishing image RGB or mono, until now there wasn't the necessity to stream something different

https://github.com/robotology/yarp/blob/14df562aa253fb8d041ba98eaa7fb96d3281017a/src/libYARP_dev/src/yarp/dev/IFrameGrabberImage.h#L84-L87

Introducing it in YARP means:

Nicogene commented 2 years ago

Sadly Yuv in Yarp is not completely supported

The cameras currently installed on iCub3 stream in Yuv if I am not mistaken icub-tech-iit/tickets#132 (comment)

Ah but if it uses frameGrabber_nws_yarp or grabberDual they are streamed as RGB for what I know: https://github.com/robotology/yarp/blob/14df562aa253fb8d041ba98eaa7fb96d3281017a/src/devices/ServerFrameGrabberDual/ServerFrameGrabberDual.h#L210-L213 https://github.com/robotology/yarp/blob/14df562aa253fb8d041ba98eaa7fb96d3281017a/src/devices/ServerFrameGrabberDual/FrameGrabber_nws_yarp.h#L115-L116

But I am out of the loop since a while for what concerns YARP, we should ask to Randaz maybe

pattacini commented 2 years ago

Probably, extending YARP to deal w/ YUV is something affordable and, actually, is an enabler to achieving 30 fps w/ the pylon (w/o compression).

S-Dafarra commented 2 years ago

Ah but if it uses frameGrabber_nws_yarp or grabberDual they are streamed as RGB for what I know

Ah yes, you are right

Nicogene commented 2 years ago

Alongside @triccyx we managed to correctly visualize images on yarpview thanks these changes: https://github.com/robotology/yarp-device-pylon/commit/13f248ebc1c3574681a88a19973b8ea91f57c31e

We found out that YUV422 is the only format supported by those cameras:

immagine

From https://docs.baslerweb.com/embedded-vision/pixel-format

Since YARP is not ready to digest YUV we had to convert the image in RGB8 using the pylon API and this seemed to not affect to much the performances, because we managed to have a double FHD stream on the nano @ ~25-30 fps.

pylonFHD

Note that both yarpview and the devices were running locally, we used plain TCP, we can gain something using mjpeg or udp for streaming outside

Nicogene commented 2 years ago

Ah a strange thing is that for some reason the conversion has problems on the first frame, and then we are discarding it: https://github.com/robotology/yarp-device-pylon/blob/13f248ebc1c3574681a88a19973b8ea91f57c31e/src/devices/pylon/pylonDriver.cpp#L292-L297

triccyx commented 2 years ago

@Nicogene I would prefer that the workaround was deeply hidden in the code rather than tell it to @pattacini (hahaha)

pattacini commented 2 years ago

we managed to have a double FHD stream on the nano @ ~25-30 fps.

This is a great step-0 result! 🚀 Well done @triccyx @Nicogene 🥇

So, in practice, we can send out from the GPU 2 streams in Full HD (1920 x 1080) at $\approx$ 30 fps w/o compression. The latter can be carried out on the COM-EXP Type 10.

cc @S-Dafarra @DanielePucci @maggia80

pattacini commented 2 years ago

So, in practice, we can send out from the GPU 2 streams in Full HD (1920 x 1080) at 30 fps w/o compression. The latter can be carried out on the COM-EXP Type 10.

To be precise, these tests have been carried out in loopback aboard the Nano (we have the Nano and not the NX yet).

Nicogene commented 2 years ago

Today I discovered that by default the cameras start with auto exposure set to auto and in low light condition this set the cap of the fps of the camera to 15, no matter the resolution.

Nicogene commented 2 years ago

Dual Stream over ethernet GBit connection

⚠️ Note that this preliminary tests have been conducted with acquisition rate from the sensor of 30 fps, since for the resolutions we want to use would be a great achievement reach it.

The setup is composed by my i7 laptop connected via ethernet to the jetson nano via a gigabit connection. The pylon device is running on the jetson nano, the yarpviews are running on my laptop.

tcp

640x480

image

1024x768

image

1920x1080

:warning: A lot of artifacts! Screenshot_20220803_152726

udp

640x480

image

1024x768

⚠️ A lot of artifacts! image

1920x1080

⚠️ A lot of artifacts! image

mjpeg

640x480

image

1024x768

image

1920x1080

⚠️ Artifacts image

Nicogene commented 2 years ago

This is the payload on the nano when sending 1920x1080 images compressing in mjpeg:

image

Nicogene commented 2 years ago

Reasuming

640x480 1024x768 1920x1080
tcp 30 15 8 ⚠️
udp 30 15 4 ⚠️
mjpeg 30 30 17 ⚠️

⚠️ : artifacts

In terms of payload of nano also in FHD it seems fine, I think that the bottleneck is the amount of data we send through the network.

We decided to start with mjpeg compression on the type10 cpu before moving to eventually h264 gpu-based compression

Nicogene commented 2 years ago

This PR: https://github.com/robotology/yarp-device-pylon/pull/2 add width, height and period configurable.

I am facing issues on the period, I theory I manage to change the acquisition rate of the camera, but I canno publish over the 30 fps, there should be a bug somewhere 🐛

triccyx commented 2 years ago

The artifacts can be due to the fact that the buffer should be protected by a lock_guard

Nicogene commented 2 years ago

As pointed out by @triccyx, the artifacts were due to a concurrent access to the buffer, while the port was still writing, we were overwriting with the next upcoming frame.

We fixed it changing from setExternal to a bare memcpy for now. Actually the other device does memcpy as well probably for this problems of buffer overwriting:

We repeated the test(p2p connection nano<->laptop, yarpview on the laptop) after adding the memcpy and here is the results:

640x480 1024x768 1920x1080
tcp 30 20 9
udp 30 15 4
mjpeg 30 30 20

The worst case scenario (1920x1080 + mjpeg compression) has this cpu usange on the nano:

image

pattacini commented 2 years ago

1024x768 @ 30fps, or 1920x1080 @ 20fps in mjpeg is a super good delivery already 🚀

Consider that:

cc @maggia80 @DanielePucci @S-Dafarra

Nicogene commented 2 years ago

I am facing issues on the period, I theory I manage to change the acquisition rate of the camera, but I canno publish over the 30 fps, there should be a bug somewhere bug

If we solve this issue maybe we can go over 30 fps for 1024x768. I forgot to say that I kept the auto exposure off, the fps are also depending by the exposure, but unless we do not use the camera in low light conditions, it should not be a problem.

pattacini commented 2 years ago

If we solve this issue maybe we can go over 30 fps for 1024x768.

Going above 30 fps is not strictly required with our first delivery (and it's not even necessary unless one has specific needs).

Nicogene commented 2 years ago

We analyzed also the bandwidth usage and it seems that when we use the mjpeg the bottleneck is not the network but the cpu capabilities of nano performing the compression and publish:

This is the bandwidth usage for double stream 1920x1080 in mjpeg

image

This is instead the bandwith usage when using tcp:

image

pattacini commented 2 years ago

Nice analysis!

We analyzed also the bandwidth usage and it seems that when we use the mjpeg the bottleneck is not the network but the cpu capabilities of nano performing the compression and publish:

This is kind of expected given that we have still a Nano. With a Xavier NX, we should be able to reach higher rates w/ mjpeg at FHD.

Nicogene commented 2 years ago

The next steps would be: ~- Investigate and fix the 30fps issue.~

If we are not happy about resolution+framerate we can follow these two path:

pattacini commented 2 years ago

Nice summary writeup 👍🏻

Regarding

Install Xavier NX on the carrier and test the camera on it (@mfussi66 had issues with the carrier in the past).

We could be doing other attempts, but bear in mind that our current carrier board is not the one we are going to use, which instead is expected to arrive in Sep.

triccyx commented 2 years ago

For now, I leave this task in the hands of @Nicogene :) We decided to have a synch meeting on the 19th afternoon

Nicogene commented 2 years ago
  • Investigate and fix the 30fps issue.

It is actually not an issue, but a limit of the camera itself, that cannot provide more than 30 fps

https://www.baslerweb.com/en/products/cameras/area-scan-cameras/dart/daa4200-30mci-no-mount-tray-40pcs/