ros-industrial-attic / yak_ros

Example ROS frontend node for the Yak TSDF package
Apache License 2.0
48 stars 22 forks source link

Running demo as-is, fails with TF lookupTransform error #43

Closed Beshario closed 4 years ago

Beshario commented 4 years ago

Hi, I have cloned the package, built it catkin build --cmake-args -DBUILD_DEMO=True and sourced it. After I launch it as-is, it fails and this is the output I get:

 source devel/setup.bash 
asp@asp-MS-7B84:~/stitch_ws$ roslaunch yak_ros demo.launch
... logging to /home/asp/.ros/log/75a731ba-ec9b-11ea-804a-d037451d0501/roslaunch-asp-MS-7B84-7694.log
Checking log directory for disk usage. This may take a while.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://asp-MS-7B84:38869/

SUMMARY
========

PARAMETERS
 * /image_sim_node/base_frame: world
 * /image_sim_node/framerate: 30
 * /image_sim_node/mesh: /home/asp/stitch_...
 * /image_sim_node/orbit_speed: 1.0
 * /rosdistro: melodic
 * /rosversion: 1.14.9
 * /tsdf_node/camera_matrix: [550.0, 0.0, 320....
 * /tsdf_node/cols: 640
 * /tsdf_node/rows: 480
 * /tsdf_node/tsdf_frame: tsdf_origin
 * /tsdf_node/volume_resolution: 0.001
 * /tsdf_node/volume_x: 640
 * /tsdf_node/volume_y: 640
 * /tsdf_node/volume_z: 192

NODES
  /
    image_sim_node (yak_ros/yak_ros_image_simulator)
    link1_broadcaster (tf/static_transform_publisher)
    tsdf_node (yak_ros/yak_ros_node)

auto-starting new master
process[master]: started with pid [7704]
ROS_MASTER_URI=http://localhost:11311

setting /run_id to 75a731ba-ec9b-11ea-804a-d037451d0501
process[rosout-1]: started with pid [7715]
started core service [/rosout]
process[image_sim_node-2]: started with pid [7722]
process[tsdf_node-3]: started with pid [7723]
process[link1_broadcaster-4]: started with pid [7724]
[ WARN] [1598996303.698697205]: TF lookupTransform error: Lookup would require extrapolation at time 1598996303.697581838, but only time 1598996303.735359645 is in the buffer, when looking up transform from frame [camera] to frame [tsdf_origin]
[ WARN] [1598996303.731441946]: TF lookupTransform error: Lookup would require extrapolation at time 1598996303.697581838, but only time 1598996303.735359645 is in the buffer, when looking up transform from frame [camera] to frame [tsdf_origin]
[ WARN] [1598996303.760431768]: TF lookupTransform error: Lookup would require extrapolation into the past.  Requested time 1598996303.697581838 but the earliest data is at time 1598996303.735359645, when looking up transform from frame [camera] to frame [tsdf_origin]
[ WARN] [1598996303.797623405]: TF lookupTransform error: Lookup would require extrapolation into the past.  Requested time 1598996303.730902085 but the earliest data is at time 1598996303.735359645, when looking up transform from frame [camera] to frame [tsdf_origin]
[ERROR] [1598996303.910701637]: Failed to fuse image.

Blank result: blankYak

this is my rviz configuration. yak_ros

I have done all I can to make sure Cuda is working, so it is out of the way. I could run the cuda examples with no problem.

this is an example of CUDA working: cuda

I tried to integrate the package to our industrial robot, I have been getting similar errors. Therefore I went to square one to make sure I can first get a mesh out of the demo first.

Please advise, thank you very much. :D

Beshario commented 4 years ago

I'll investigate more, I was able to make the demo work in a jetson tx2.

schornakj commented 4 years ago

Can you provide some more info about the hardware and environment where it isn't working? The most important data is:

It's definitely good that you were able to get CUDA running independently, since it's probably the main source of weird issues related to getting KinectFusion-type applications up and running. The fact that your Rviz displays the depth image correctly shows that the simulated depth image pipeline is working too.

I'll investigate more, I was able to make the demo work in a jetson tx2.

This is a little encouraging, since I think it narrows down the problem to something specific to your first computer.

Beshario commented 4 years ago

Hi @schornakj Thank you for your response.

Operating system: Ubuntu: 18.04.5 LTS (Bionic Beaver)

NVIDIA Corporation GK107 [GeForce GT 740]

asp@asp-MS-7B84:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

I realized when I run /usr/local/cuda/bin/nvcc --version it gives:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

which is different from the previous nvcc --version command. I will look into this more

lastly: 'NVIDIA Driver Version: 450.51.06'

schornakj commented 4 years ago

If you have multiple versions of CUDA installed, the problem might be that your workspace is building against one version but trying to run using the other version and failing. Do you have anything in your .bashrc or elsewhere that adds CUDA paths to LD_LIBRARY_PATH or other environment variables? If so, which version of CUDA do they point to?

The GK107 GPU is listed as Compute Capability 3.0 (Kepler) which means it should be compatible with either CUDA 9.1 or 10.2 as well as Nvidia Driver 450.51, so there shouldn't be anything fundamentally preventing this software from running on your computer. This compatibility info is listed in CUDA's documentation.

Beshario commented 4 years ago

It seemed that after installing cuda, we installed cuda_tools from apt-ge at a different version?

I uninstalled Cuda completely then reinstalled cuda 10.2, with driver 390

I didnt install cuda-tools ( nvc --version) gives

Command 'nvcc' not found, but can be installed with:

sudo apt install nvidia-cuda-toolkit

However, /usr/local/cuda/bin/nvcc --version gives:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

File ~/.bashrc includes:

export PATH=:...:/usr/local/cuda-10.2
#cuda environment variables
export LD_LIBRARY_PATH=:usr/local/cuda-10.2/lib64:

I have built and tested the demo with and I received same errors.

[ WARN] [1599576968.271444562]: TF lookupTransform error: Lookup would require extrapolation at time 1599576968.270284721, but only time 1599576968.273002273 is in the buffer, when looking up transform from frame [camera] to frame [tsdf_origin]
[ WARN] [1599576968.300546820]: TF lookupTransform error: Lookup would require extrapolation into the past.  Requested time 1599576968.270284721 but the earliest data is at time 1599576968.273002273, when looking up transform from frame [camera] to frame [tsdf_origin]
[ERROR] [1599576968.442203459]: Failed to fuse image.

I upped the update rate on demo.lunch file

<param name="framerate" value="90"/>

The picture was still not fused on the output windows

Lastly, we have upgraded the Nvidia GPU hardware to NVIDIA Corporation GM204 [GeForce GTX 970] which has Maxwell architecture and the demonstration worked right away (with the previous CUDA and driver configurations) .

The demonstration still gave warning and an error message, which I am not sure of the severity of.

[ WARN] [1599576968.271444562]: TF lookupTransform error: Lookup would require extrapolation at time 1599576968.270284721, but only time 1599576968.273002273 is in the buffer, when looking up transform from frame [camera] to frame [tsdf_origin]
[ WARN] [1599576968.300546820]: TF lookupTransform error: Lookup would require extrapolation into the past.  Requested time 1599576968.270284721 but the earliest data is at time 1599576968.273002273, when looking up transform from frame [camera] to frame [tsdf_origin]
[ERROR] [1599576968.442203459]: Failed to fuse image.

However, there we can see the fused-image on the output window which is definitely a good sign. Screenshot from 2020-09-08 11-00-44.

schornakj commented 4 years ago

I uninstalled Cuda completely then reinstalled cuda 10.2, with driver 390

How did you pick which version of the Nvidia driver to install? I don't think that driver 390 is compatible with CUDA 10.2, since the compatibility chart says you need at least driver 440.33. There's some info on CUDA setup in the Yak readme. If you run ubuntu-drivers devices it will list the possible Nvidia driver versions that are compatible with your GPU.

I didnt install cuda-tools ( nvc --version) gives

Command 'nvcc' not found, but can be installed with:

sudo apt install nvidia-cuda-toolkit

However, /usr/local/cuda/bin/nvcc --version gives:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

File ~/.bashrc includes:

export PATH=:...:/usr/local/cuda-10.2
#cuda environment variables
export LD_LIBRARY_PATH=:usr/local/cuda-10.2/lib64:

I think this is just an environment setup issue. Make sure you follow the post-installation environment setup instructions in section 6.1.1 of the CUDA installation guide, and make sure that the paths you're adding to the environment variable are correct.

I have built and tested the demo with and I received same errors.

[ WARN] [1599576968.271444562]: TF lookupTransform error: Lookup would require extrapolation at time 1599576968.270284721, but only time 1599576968.273002273 is in the buffer, when looking up transform from frame [camera] to frame [tsdf_origin]
[ WARN] [1599576968.300546820]: TF lookupTransform error: Lookup would require extrapolation into the past.  Requested time 1599576968.270284721 but the earliest data is at time 1599576968.273002273, when looking up transform from frame [camera] to frame [tsdf_origin]
[ERROR] [1599576968.442203459]: Failed to fuse image.

I upped the update rate on demo.lunch file

<param name="framerate" value="90"/>

The picture was still not fused on the output windows

Lastly, we have upgraded the Nvidia GPU hardware to NVIDIA Corporation GM204 [GeForce GTX 970] which has Maxwell architecture and the demonstration worked right away (with the previous CUDA and driver configurations) .

The demonstration still gave warning and an error message, which I am not sure of the severity of.

[ WARN] [1599576968.271444562]: TF lookupTransform error: Lookup would require extrapolation at time 1599576968.270284721, but only time 1599576968.273002273 is in the buffer, when looking up transform from frame [camera] to frame [tsdf_origin]
[ WARN] [1599576968.300546820]: TF lookupTransform error: Lookup would require extrapolation into the past.  Requested time 1599576968.270284721 but the earliest data is at time 1599576968.273002273, when looking up transform from frame [camera] to frame [tsdf_origin]
[ERROR] [1599576968.442203459]: Failed to fuse image.

However, there we can see the fused-image on the output window which is definitely a good sign.

The warnings aren't a big deal, it just means that the TSDF node received an image before TF data was available for the camera frame. The error can indicate a problem with the CUDA portion of the fusion algorithm, but if you're getting correct output from the demo node then it's also not something to worry about in this case.

Since you were able to get it working by swapping out your GPU, I think your original issues were caused by having an incompatible combination of GPU hardware, Nvidia driver, and CUDA version (going back to my first comment about checking for the recommended Nvidia driver for your GPU). The CUDA-dependent packages will still compile correctly as long as compatible CUDA libraries are available, but runtime errors like the ones you've encountered will still appear if the GPU hardware or Nvidia driver isn't compatible with the CUDA libraries used to build the packages.

Beshario commented 4 years ago

Thank you for your help and explanations @schornakj : Indeed, I think some of the run-time errors were not following the post-installation instructions exactly and setting the environment variables right. Possible incompatible combination of GPU hardware, Nvidia driver, and CUDA version during debgugging

I have cded around to make sure all the cuda folders are available. found /usr/local/cuda/bin and /usr/local/cuda/lib65 here cuda folder is a shorcut for -> cuda-10.2

Now the ~/.bashrc incldues

export PATH=/home/asp/.local/bin:/usr/local/cuda/bin:...
#cuda environment variables
export LD_LIBRARY_PATH=/usr/local/cuda/lib65

(no colon right after first equal sign), at some point Kinfu2 was looking at downloads, based on ROS logs. refreshed myself with Environment variables

asp@asp-MS-7B84:~/catkin_ws$ cat  /home/asp/.ros/log/c42f3ebc-f2a0-11ea-88b2-d037451d0501/tsdf_node-1*.log
[ INFO] [1599658305.640960651]: Camera Intr Params: 456.886719 457.386719 336.316406 239.046875

[ INFO] [1599658305.642846082]: TSDF Volume Dimensions (Voxels): [640, 640, 192]
[ INFO] [1599658305.642880891]: TSDF Volume Resolution (m): 0.01
KinFu2 error: no kernel image is available for execution on the device  /home/asp/Downloads/yak/yak/src/cuda/tsdf_volume.cu:39

I have performed a catkin clean then catkin build and things seem to be working.

afterwards, I notice the nvcc --version works fine in a new tab.

I rechecked the driver now and it is at 450.51.06 for clarity. It was a bit confusing to read (more than or less than) the driver versions compatibility with CUDA versions in the cuda documentation.

Warmest Regards

schornakj commented 4 years ago

I'm really glad to hear that it's running now! Feel free to post another issue if you have other questions. We would definitely be interested in hearing more about your project in general as well.


It was a bit confusing to read (more than or less than) the driver versions compatibility with CUDA versions in the cuda documentation.

I think this is probably the hardest part of setting up this type of system. The first time I had to install CUDA on a computer it took me an entire week to get it working!