Closed barbalberto closed 8 years ago
During the meeting we agreed on the following points:
NiTE
, openNI
...).IFrameGrabber
interface will be used for extracting RGB images.
Framegrabber
device will be instantiated by depthCamera
wrapper in order to reuse an existing device and have access to RGB camera image customization like saturation, brightness and so on (if available from device).IDepth
interfaces will be created to get depth images.
IDepth
interface must have a way to get RGB and depth camera synchronized; how to achieve this could be either by doing some magic on client side or by asking the server to open a special port sending both together. float
values will be used to express depth in [meters].max
and min
depth will be set to indicate preferred range of use of device and to convert a float
data to int16
or int8
data type. This can be useful to compare HW depth sensors with result of disparity map algorithm or different HW sensors together.yarpview
.IHumanTracking
); this interface may or may not be implemented by device driver since it is concerned with software computation and usually not implemented directly by the HW.IPointCloud
interface can be created.As first step, the new interface for depth camera will be created and a new pair of Wrapper/Client will be created with the idea that, in the long run, this will be the official YARP depth camera wrapper and all others will be discontinued.
Other interfaces like IPointCloud
will be added later on, or a new wrapper/client will be created integrating them.
Ideally, the IHumanTracking
interface should not be implemented by a device driver if the skeleton identification and tracking are not provided by the HW. Similarly, it should not be done by the wrapper, because it is not the purpose of the wrapper doing any elaboration on the data but just make them available through the network to a remote client.
As long as the identification and tracking is done by software, a separated software module should implement the algorithm based on the data read from the wrapper. This is for code re-usability, to allows many identification libraries to be used on same data and to avoid adding unnecessary dependencies on the wrapper code.
About skeleton, one issue is still pending: We should find a YARP definition of what a skeleton is, if possible, to allow code written by different user to share data between them and work together. From code I saw, it looks like the basic definition of a skeleton includes a mix of the following information:
Questions to anyone working with human identification and skeleton: what could be a shared deifinition of a skeleton?
@tanismar @Tobias-Fischer @kt10aan
Hi @barbalberto, First of all thanks for taking the initiative for defining a standard for depth cameras in yarp. I think it's long overdue. In our project, we use https://github.com/robotology/kinect-wrapper There, a skeleton is defined by exactly the above mentioned attributes, except of that the confidence is defined per joint rather than a general confidence value. I personally would purely work with the real world coordinates, and provide an interface to retrieve the image coordinates from world coordinates if needed. Furthermore, the standard should probably provide a method to get a list of all possible skeleton nodes, and edges between the nodes. Then they can be properly visualized, and things like angles between the joints can be calculated.
Best, Tobias PS: The point cloud interface would be a huge enhancement to yarp, and is probably the biggest chunk missing compared to ROS when it comes to robot perception. And an alternative to rviz ..
IDepth is a bit ambiguous, why not IRGBD or IRgbd?
A skeleton interface could also be used to interface with mocap systems, perhaps @claudia-lat could be interested in reading about this discussion.
One comment on the discussion: it could be a good idea to avoid saying things are expressed in the "world" or "real world" frame. If you have multiple sensors (IMU, Kinect, Mocap) having multiple "real worlds" could be confusing. : )
Sensor frame could be an alternative, but even that is prone to confusion (for an IMU, its "world" frame is totally different from its "sensor" frame).
IDepth
is a bit ambiguous, why notIRGBD
orIRgbd
?
What about IDepthMap
? I believe that IRGBD
is not a good name here, since if I understand correctly, @barbalberto is talking about a depth only interface, not rgbd.
RGB port has to be compliant with yarpview.
IMO also the depth map port should be compatible
Ideally, the
IHumanTracking
interface should not be implemented by a device driver if the skeleton identification and tracking are not provided by the HW.
What about drivers that implement skeleton for the hand? IHumanTracking
does not sound good here.
From what i have understood the proposal is for an interface that groups methods to get RGB and Depth (in an atomic call) therefore it is an RGBD sensor. Depth images can come from many sensors that do not have RGB.
All names of interfaces are just tentative, so open to discussion. Actually my proposal was not clear enough, hence the misunderstanding, sorry... I'll try to explain better.
My idea at first was to have interface just for depth, so that it could be used also for other kind of sensors like lasers (now hokuyo uses genericSensor). Then a rgbd device will simply implement both rgb (current framegrabber
) + depth.
Problem: In this case there will be no way for the client to get the pair rgb+depth images synchronized because they are acquired through 2 different interfaces.
Since synchronization seems to be a good plus, then we thought that the depth interface would be a fitting place for adding a method to get both of them; so the depth interface will have both getDepth() and getRGBD() (like @lornat75 says), which is not so clean and not so reusable anymore for other sensors like laser.
The alternative would be to have yet another interface like iDepth
and iRGBD
, so that iDepth
will have olny detDepth()
, while iRGBD
will inherit from rgb and iDepth and add this getRGBD()
method.
This will create more granularity allowing each devide to reflect exactly what it does, on the other hand it'll create a pollution of interfaces with just one or two methods each, which I don't like too much (but I admitt it is more clear and probably avoid confusion).
In any case, since rgb image and depth image will be broadcasted though 2 different ports, synch issues are a problem of the client which have to check somehow if images are in synch or not.
IMO also the depth map port should be compatible
yes I agree, the point here is that depth images will be in float, is the yarpview able to plot a float image? The idea of adding a max/min range was to rescale data into a integer number also for this purpose. If this is the case, who shall carry out this convertion, the wrapper, the yarpview or a carrier?
What about drivers that implement skeleton for the hand? IHumanTracking does not sound good here.
Why not? the hand is still a piece of human body :-P The idea was to differentiate between tracking of body and objects. If the definition of skeleton is good enough there will be no ambiguity while receiving data.
I think we should split data representation from the device, and add the required classes to yarp::sig
. Depending on what we get, we should think about extending the yarp::dev::FrameGrabber
, or creating a new interface (yarp::dev::DepthGrabber
? yarp::dev::PointCloudGrabber
? yarp::dev::RGBDGrabber
? We can think about this later).
What are the ways to represent 3D data? Maybe I'm totally wrong here, since I've never worked with depth cameras, but I suppose something like this is required (together with a set of functions to switch from one format to another)
Image
DepthMap
RGBD
PointCloud
(xyz, xyzrgb).An important thing (and that I believe it is currently missing in yarp::sig::Image
) is that each of them should have a viewpoint associated (translation + rotation). As an alternative, we could use the envelope to transmit this information, like we do for the oculus, but we should spend some more time to define a class that contains both the timestamp and the viewpoint and that can be read as a timestamp in order not to break compatibility.
yarp::sig::ImageOf
should be easy to extend to support both DepthMap
and RGBD
. I don't see a good reason here for forcing float
data at data representation level. Also RGB
automatically forces the format for the color information, maybe ColorDepthMap
is a more appropriate name.
The yarp::sig::PointCloud
class should map to pcd files and pcl::PointCloud
classes, see http://www.pointclouds.org/documentation/tutorials/pcd_file_format.php and http://docs.pointclouds.org/trunk/classpcl_1_1_point_cloud.html . Probably ImageOf
can still be used here with a meaning similar to PCL, i.e. height = 1 => unorganized point cloud
Hi! I'm not so knowledgeable at the low level on how to define the interfaces to properly and compatibly read depth data from different devices, but I've done some work to ease working with pointclouds on YARP: On the one hand, there is a module initially developed by Ugo and further enhanced by myself which reads data from color + depth images (rgb+d), and given a desired crop, sends a list of xyzrgb points, and optionally saves it also as ply (for pcl) or .off (for meshlab). This module does NOT depend on pcl library, and so the output is a bottle of bottles, where each sub-bottle is made up of the 6 values (xyzrgb) as doubles. The module might require some extra work, but the working can be found here: https://github.com/tanismar/obj3Drec
On the other hand, I did a small library to transform between those bottles and the pcl format, and which functions might be useful for the PointCloudGrabber (or whatever the final name is. The library is here: https://github.com/tanismar/objects3DModeler/tree/master/libYarpCloud
1) There are new actors in the field we shouldn't forget about: Intel Realsense F200/SR300/R200, with recent more official linux/osx support (apart from previously existing Windows SDK): https://software.intel.com/en-us/blogs/2016/01/26/realsense-linux-osx-drivers. I still do not physically have this type of device, but expect to have one soon. I'd issue the driver as a YARP enhancement once the OpenNI2 or similar drivers are more or less stable.
2) There is extra information that must be transmitted:
The envelope
solution proposed by @drdanz seems like a good idea; it's a matter of taking the right design decisions to not induce too much overhead. Am I missing any other information that may prove valuable??
3) Regarding overhead/efficiency, I'd vote for vector types (as opposed to bottles as @tanismar, which were a good first approximation). cc @drdanz, @lornat75, @pattacini could we have your opinion??
I agree that vectors are better than bottles. Assuming vectors an store all the required information. We can also come up with a specific type and write it as an (efficient) portable.
@jgvictores
1) Good to know: better to complete the migration before buying one of that kind, so if some work needs to be done for a 'custom' driver it'll start already with the right foot :smile:
2) How to insert addictional information in the depth/rgb image is open to discussion. We could either send a data type pairOf<image, something> or use a different port for extra data. If we start using server/client the addictional port will be transparent, but synch issue will increase.
3) When you say vector
type instead of bottles
are you talking about depth/rgb images or point clouds?
New version of interface is defined in the IRGBD_interface_v2 branch
@barbalberto #974 was merged. Can we close this now?
Yes :+1:
Working with depth camera and kinect like devices we found out the lack of a clear and shared definition of data type for those sensors.
An internal meeting will be of help on monday 23/11/15 afternoon to define one and a tentative schedule for code refactoring will be set.
This issue will follows the updates on topic and host feedbacks.