osrf / rvizweb

RVizWeb: RViz on the browser
290 stars 59 forks source link

Point cloud delayed #9

Closed athackst closed 2 years ago

athackst commented 5 years ago

I'm running TF and points (from a velodyne) and we're seeing a really long lag between the display and true data

mvollrath commented 5 years ago

PointCloud2 performance has been bad because it is sending base64-encoded data in a JSON string, which is large over the wire, slow to decode, and expensive to convert back to binary data. It is greatly improved as of this ros3djs commit on develop branch which switches to the new CBOR binary encoding.

We will also need to use roslibjs of at least this commit on develop branch for binary decoding.

We will need to run it against rosbridge_suite of at least 0.10.1 for binary encoding, which is released to melodic and set to be included in the next kinetic sync.

I will work out a PR to switch to the new client code once the kinetic sync is released.

jubeira commented 5 years ago

Hi @mvollrath,

I've been trying to use the latest ros3djs#develop together with the latest rosbridge_suite to check if this solved the issue, but I couldn't get the basic pointcloud2 example to work properly using a simulated Turtlebot / PR2. Here's the description of the issue: https://github.com/RobotWebTools/ros3djs/issues/243 (note that I checked the pointcloud and I see no NaNs in the message).

Am I missing something obvious? Do I need to do anything specific to use CBOR besides using the latest rosbridge_suite from source?

mvollrath commented 5 years ago

You need to use roslibjs from develop branch, things will probably be broken until they release it.

mvollrath commented 5 years ago

I think that turtlebot cloud you're trying to use has a different point format, ros3djs seems like it only supports one. Also it's epic huge.

jubeira commented 5 years ago

I think that turtlebot cloud you're trying to use has a different point format, ros3djs seems like it only supports one. Also it's epic huge.

Note that I could see the pointcloud with ros3djs@master (with a huge delay, and streaming only a few secs to prevent overflowing my PC's RAM), so that's why I thought it was not a format issue in this case. Did that change in develop branch as well? And yes, the amount of data is really big, but it's also a pretty common example to work on.

I'll give roslibjs@develop branch a shot as you propose.

mvollrath commented 5 years ago

There's a different code path for decoding binary vs. base64 PointCloud2 data, for whatever reason. That's probably it works on master but not over CBOR.

The PointCloud2 data in the turtlebot thing is malformed. Either the fields are mislabeled or 50% of the data is unused bloat.

jubeira commented 5 years ago

The PointCloud2 data in the turtlebot thing is malformed. Either the fields are mislabeled or 50% of the data is unused bloat.

Yes, when I print the data on the console I see a large amount of 0s everywhere. Today I tried the latest ros3djs examples and rosbridge_suite with a different pointcloud source (tango ros streamer) and it worked fine.

The pointcloud coming from Turtlebot Gazebo (not ok) has the following metadata:

juan@Sullust:~$ rostopic echo /camera/depth/points 
header: 
  seq: 0
  stamp: 
    secs: 15
    nsecs: 430000000
  frame_id: "camera_depth_optical_frame"
height: 480
width: 640
fields: 
  - 
    name: "x"
    offset: 0
    datatype: 7
    count: 1
  - 
    name: "y"
    offset: 4
    datatype: 7
    count: 1
  - 
    name: "z"
    offset: 8
    datatype: 7
    count: 1
  - 
    name: "rgb"
    offset: 16
    datatype: 7
    count: 1
is_bigendian: False
point_step: 32
row_step: 20480

while the one coming from the Tango device (ok) has the following:

juan@Sullust:~$ rostopic echo /tango/point_cloud 
header: 
  seq: 0
  stamp: 
    secs: 1546889935
    nsecs: 391690969
  frame_id: "camera_depth"
height: 1
width: 32987
fields: 
  - 
    name: "x"
    offset: 0
    datatype: 7
    count: 1
  - 
    name: "y"
    offset: 4
    datatype: 7
    count: 1
  - 
    name: "z"
    offset: 8
    datatype: 7
    count: 1
  - 
    name: "c"
    offset: 12
    datatype: 7
    count: 1
is_bigendian: False
point_step: 16
row_step: 527792

Then, the one coming from Gazebo has twice the point step. datatype 7 is FLOAT32, so yes, considering the offsets and the point steps there are 4 'bloat' bytes at offset 12 and 12 'bloat; bytes at offset 20. Basically half of the bytes in the message are useless. Other than that row_step vs width and height ratios are different, but I'm not sure if that's a problem or not in this case; both messages seem to be consistent at least.

I will try viewing the pointcloud in rvizweb as well to see what happens using ros3djs@develop; if the issue with the pointcloud coming from Gazebo a format problem we can tackle that afterwards.

jubeira commented 5 years ago

Update: I could see the (well formed) pointcloud using ros3djs@develop in rvizweb, but now I don't see much difference with the version used in rvizweb#master branch.

I tried again pointcloud2 example with the latest rosbridge; the example already uses roslibjs 0.20.0. When I see the packets over wireshark I still see messages coming from port 9090 as JSONs (or at least I cannot notice an important difference). @mvollrath am I missing something obvious?

mvollrath commented 5 years ago

roslibjs 0.20.0 doesn't have support for "cbor" decompression, for some reason they made a release last week but only included commits from a year ago. However, ros3djs should have requested "cbor" compression from rosbridge and gotten binary messages back (even though roslibjs couldn't decompress them). In ros3djs develop, "cbor" is the default compression for PointCloud2. So it sounds like it's either the old rosbridge (which would ignore "cbor" compression and use JSON) or the old ros3djs (which would not request "cbor" compression) or both, and when both of those are right, roslibjs 0.20.0 can't decompress it.

Since you're looking at the packets over wireshark, see if the client is requesting "cbor" compression in the subscription message and that should help to narrow it down.

jubeira commented 5 years ago

@mvollrath thanks for those hints! I hadn't noticed the pre-built version wasn't updated with the latest changes. I rebuilt the latest ros3djs and roslibjs and now I can see the cbor request (after switching roslibjs to the latest one I don't see it complaining about BSON headers anymore).

Now I'm getting another error when receiving messages:

RangeError: invalid array length

But at least it seems I got the latest versions of all the tools to play along. I'll try debugging this now.

mvollrath commented 5 years ago

Where is that RangeError happening?

mvollrath commented 5 years ago

I dug through a bunch of dependencies, forked six repos, and ended up with a working Bower distribution with CBOR point clouds. There is a PR to follow this chain of forks, but will take some more work to get all of the deps released and updated.

jubeira commented 5 years ago

Where is that RangeError happening?

After some debugging I found that the problem is here: https://github.com/RobotWebTools/ros3djs/blob/e3fb0ad8d971d666357c7b49fbeea9e87c06d463/src/sensors/PointCloud2.js#L99

In my case points.buffer doesn't have the size of the message (I didn't set max sizes optional values; I'm following pointcloud2 example as it is).

The problem is partially solved by https://github.com/RobotWebTools/ros3djs/pull/244, but then the size of points.positions in the for loop that is below the modifications here: https://github.com/RobotWebTools/ros3djs/blob/e3fb0ad8d971d666357c7b49fbeea9e87c06d463/src/sensors/PointCloud2.js#L112-L124

doesn't match the dimensions for the DataView. One option could be adjusting the limit n to match the size that you have in the buffer; I'll comment that on the PR.

I dug through a bunch of dependencies, forked six repos, and ended up with a working Bower distribution with CBOR point clouds. There is a PR to follow this chain of forks, but will take some more work to get all of the deps released and updated.

Cool! I gave it a quick shot and it worked for the pointclouds I was testing. I'll take a deeper look; it would be awesome to start sending PRs to the upstream repos so as not to depend on the chain of forks on specific commits as you point out.

jubeira commented 5 years ago

Cool! I gave it a quick shot and it worked for the pointclouds I was testing.

I didn't have the issue with rvizweb basically because max_pts is set to something larger than the pointcloud I was using for testing purposes: https://github.com/EndPointCorp/polymer-ros-rviz/blob/9c4061476657724fa8900a1ee973748e3781035d/ros-rviz-point-cloud-2.html#L81.

If the pointcloud is larger than max_pts and you need ros3djs to clip it, the upper bound mentioned in here https://github.com/RobotWebTools/ros3djs/pull/244/files#r246909707 has to be fixed.

mvollrath commented 5 years ago

Glad it's working! I'll be working on getting the upstreams fixed, and I'll test ros3djs some more against max_pts.

jubeira commented 5 years ago

For the record, the tests I ran with a pointcloud coming from a Tango device applying changes mentioned in #13 an rosbridge 0.10.1 went well. This should be ready to be tested with the Velodyne data input.

chapulina commented 5 years ago

we're seeing a really long lag between the display and true data

@athackst , let us know if the latest branch works better for you and we can close this issue. Thanks!