Closed athackst closed 2 years ago
PointCloud2 performance has been bad because it is sending base64-encoded data in a JSON string, which is large over the wire, slow to decode, and expensive to convert back to binary data. It is greatly improved as of this ros3djs commit on develop branch which switches to the new CBOR binary encoding.
We will also need to use roslibjs of at least this commit on develop branch for binary decoding.
We will need to run it against rosbridge_suite of at least 0.10.1 for binary encoding, which is released to melodic and set to be included in the next kinetic sync.
I will work out a PR to switch to the new client code once the kinetic sync is released.
Hi @mvollrath,
I've been trying to use the latest ros3djs#develop
together with the latest rosbridge_suite
to check if this solved the issue, but I couldn't get the basic pointcloud2
example to work properly using a simulated Turtlebot / PR2. Here's the description of the issue: https://github.com/RobotWebTools/ros3djs/issues/243 (note that I checked the pointcloud and I see no NaNs in the message).
Am I missing something obvious? Do I need to do anything specific to use CBOR
besides using the latest rosbridge_suite
from source?
You need to use roslibjs from develop branch, things will probably be broken until they release it.
I think that turtlebot cloud you're trying to use has a different point format, ros3djs seems like it only supports one. Also it's epic huge.
I think that turtlebot cloud you're trying to use has a different point format, ros3djs seems like it only supports one. Also it's epic huge.
Note that I could see the pointcloud with ros3djs@master
(with a huge delay, and streaming only a few secs to prevent overflowing my PC's RAM), so that's why I thought it was not a format issue in this case. Did that change in develop
branch as well?
And yes, the amount of data is really big, but it's also a pretty common example to work on.
I'll give roslibjs@develop
branch a shot as you propose.
There's a different code path for decoding binary vs. base64 PointCloud2 data, for whatever reason. That's probably it works on master but not over CBOR.
The PointCloud2 data in the turtlebot thing is malformed. Either the fields are mislabeled or 50% of the data is unused bloat.
The PointCloud2 data in the turtlebot thing is malformed. Either the fields are mislabeled or 50% of the data is unused bloat.
Yes, when I print the data on the console I see a large amount of 0
s everywhere.
Today I tried the latest ros3djs
examples and rosbridge_suite
with a different pointcloud source (tango ros streamer) and it worked fine.
The pointcloud coming from Turtlebot Gazebo (not ok) has the following metadata:
juan@Sullust:~$ rostopic echo /camera/depth/points
header:
seq: 0
stamp:
secs: 15
nsecs: 430000000
frame_id: "camera_depth_optical_frame"
height: 480
width: 640
fields:
-
name: "x"
offset: 0
datatype: 7
count: 1
-
name: "y"
offset: 4
datatype: 7
count: 1
-
name: "z"
offset: 8
datatype: 7
count: 1
-
name: "rgb"
offset: 16
datatype: 7
count: 1
is_bigendian: False
point_step: 32
row_step: 20480
while the one coming from the Tango device (ok) has the following:
juan@Sullust:~$ rostopic echo /tango/point_cloud
header:
seq: 0
stamp:
secs: 1546889935
nsecs: 391690969
frame_id: "camera_depth"
height: 1
width: 32987
fields:
-
name: "x"
offset: 0
datatype: 7
count: 1
-
name: "y"
offset: 4
datatype: 7
count: 1
-
name: "z"
offset: 8
datatype: 7
count: 1
-
name: "c"
offset: 12
datatype: 7
count: 1
is_bigendian: False
point_step: 16
row_step: 527792
Then, the one coming from Gazebo has twice the point step. datatype
7 is FLOAT32
, so yes, considering the offsets and the point steps there are 4 'bloat' bytes at offset 12 and 12 'bloat; bytes at offset 20. Basically half of the bytes in the message are useless.
Other than that row_step
vs width
and height
ratios are different, but I'm not sure if that's a problem or not in this case; both messages seem to be consistent at least.
I will try viewing the pointcloud in rvizweb
as well to see what happens using ros3djs@develop
; if the issue with the pointcloud coming from Gazebo a format problem we can tackle that afterwards.
Update: I could see the (well formed) pointcloud using ros3djs@develop
in rvizweb
, but now I don't see much difference with the version used in rvizweb#master
branch.
I tried again pointcloud2 example with the latest rosbridge
; the example already uses roslibjs 0.20.0
. When I see the packets over wireshark I still see messages coming from port 9090 as JSONs (or at least I cannot notice an important difference). @mvollrath am I missing something obvious?
roslibjs 0.20.0 doesn't have support for "cbor" decompression, for some reason they made a release last week but only included commits from a year ago. However, ros3djs should have requested "cbor" compression from rosbridge and gotten binary messages back (even though roslibjs couldn't decompress them). In ros3djs develop, "cbor" is the default compression for PointCloud2. So it sounds like it's either the old rosbridge (which would ignore "cbor" compression and use JSON) or the old ros3djs (which would not request "cbor" compression) or both, and when both of those are right, roslibjs 0.20.0 can't decompress it.
Since you're looking at the packets over wireshark, see if the client is requesting "cbor" compression in the subscription message and that should help to narrow it down.
@mvollrath thanks for those hints!
I hadn't noticed the pre-built version wasn't updated with the latest changes. I rebuilt the latest ros3djs
and roslibjs
and now I can see the cbor
request (after switching roslibjs
to the latest one I don't see it complaining about BSON headers anymore).
Now I'm getting another error when receiving messages:
RangeError: invalid array length
But at least it seems I got the latest versions of all the tools to play along. I'll try debugging this now.
Where is that RangeError happening?
I dug through a bunch of dependencies, forked six repos, and ended up with a working Bower distribution with CBOR point clouds. There is a PR to follow this chain of forks, but will take some more work to get all of the deps released and updated.
Where is that RangeError happening?
After some debugging I found that the problem is here: https://github.com/RobotWebTools/ros3djs/blob/e3fb0ad8d971d666357c7b49fbeea9e87c06d463/src/sensors/PointCloud2.js#L99
In my case points.buffer
doesn't have the size of the message (I didn't set max sizes optional values; I'm following pointcloud2
example as it is).
The problem is partially solved by https://github.com/RobotWebTools/ros3djs/pull/244, but then the size of points.positions
in the for
loop that is below the modifications here: https://github.com/RobotWebTools/ros3djs/blob/e3fb0ad8d971d666357c7b49fbeea9e87c06d463/src/sensors/PointCloud2.js#L112-L124
doesn't match the dimensions for the DataView
. One option could be adjusting the limit n
to match the size that you have in the buffer; I'll comment that on the PR.
I dug through a bunch of dependencies, forked six repos, and ended up with a working Bower distribution with CBOR point clouds. There is a PR to follow this chain of forks, but will take some more work to get all of the deps released and updated.
Cool! I gave it a quick shot and it worked for the pointclouds I was testing. I'll take a deeper look; it would be awesome to start sending PRs to the upstream repos so as not to depend on the chain of forks on specific commits as you point out.
Cool! I gave it a quick shot and it worked for the pointclouds I was testing.
I didn't have the issue with rvizweb
basically because max_pts
is set to something larger than the pointcloud I was using for testing purposes: https://github.com/EndPointCorp/polymer-ros-rviz/blob/9c4061476657724fa8900a1ee973748e3781035d/ros-rviz-point-cloud-2.html#L81.
If the pointcloud is larger than max_pts
and you need ros3djs
to clip it, the upper bound mentioned in here https://github.com/RobotWebTools/ros3djs/pull/244/files#r246909707 has to be fixed.
Glad it's working! I'll be working on getting the upstreams fixed, and I'll test ros3djs some more against max_pts.
For the record, the tests I ran with a pointcloud coming from a Tango device applying changes mentioned in #13 an rosbridge 0.10.1
went well. This should be ready to be tested with the Velodyne data input.
we're seeing a really long lag between the display and true data
@athackst , let us know if the latest branch works better for you and we can close this issue. Thanks!
I'm running TF and points (from a velodyne) and we're seeing a really long lag between the display and true data