ros-industrial-attic / ur_modern_driver

(deprecated) ROS 1 driver for CB1 and CB2 controllers with UR5 or UR10 robots from Universal Robots
Apache License 2.0
305 stars 338 forks source link

Wrong/random joint positions published by the driver after loosing the connectionto the realtime port #128

Closed cthorey closed 7 years ago

cthorey commented 7 years ago

I got weird things happening when I bring up my UR5 using the ur_modern_driver. For some reasons that I don't understand, the transformation from the tool0_controller to the base start returning random values as well as NaN. As a result, I am not able to use moveit for path planning.

Here is a sample of the output when I run

rosrun tf tf_echo tool0_controller base

screenshot from 2017-08-08 17-32-10

Sometimes, if I run the same command right after I bring up the UR5, I got the right output (display the right frame in Rviz as well) but it starts to display wrong values after a moment. Usually, bringing up moveIt! trigger the problem but not all the times.

As recommended in the README.md file, I add

<!-- Connect tool0_controller to base using floating joint -->
    <link name="tool0_controller"/>
    <joint name="base_tool0_controller_floating_joint" type="floating">
      <origin xyz="0 0 0" rpy="0 0 0"/>
      <parent link="${prefix}base"/>
      <child link="tool0_controller"/>
    </joint>

to the urdf.

I still got some warning from MoveIt! that seems related to the tool0_controller as well as Error when the transformation starts getting weird.

screenshot from 2017-08-08 18-05-22

I am using ROS kinetic and the last version of the ur_modern_driver. Does anyone encunter the same problem before?

cthorey commented 7 years ago

After digging a little deeper, I think I have precisely locate the problem. The transition from working properly to random positions brodcasted for all the joints by the modern_driver actually happens when the driver loose the connection

[ WARN] [1502277420.858224890]: Realtime port: No connection. Is controller crashed? Will try to reconnect│
 in 10 seconds...                                                                                         
[ INFO] [1502277430.859379565]: Realtime port: Reconnected

Right after it reconnected, the published joint states start to show random/wrong values. Any guess on how to fix it ?

cthorey commented 7 years ago

Still digging deeper, the problem was coming from the ur_realtime_communication.cpp file, especially the reconnection attempt in the run method. For those who found themselves in similar situation, i fork the repo and add a fix in the package.

gavanderhoorn commented 7 years ago

If this is (still) an issue with the code in this repository, why close the issue?

happygaoxiao commented 7 years ago

Hello @cthorey . I have the same problem sometimes. I clone your package of folder /src, but problem still exists. Could you tell me how to fix this problem? Thanks a lot.

SrdjanStevanetic commented 7 years ago

Hi guys, I have already tested it with the version @cthorey has fixed but the problem still remains. Could you please check it? Many thanks.

happygaoxiao commented 7 years ago

Hello,@SrdjanStevanetic , I have also tested it with the version of @cthorey , problem still remains. It appears randomly. Is there anyone who have a solution? Thanks.

SrdjanStevanetic commented 7 years ago

Hi, I have seen one strange thing. In the tool0_controller defined in the urdf as @cthorey showed above the name of the joint is base_tool0_controller_floating_joint. However please note that the name of the joint written here: https://github.com/cthorey/ur_modern_driver is with hyphen (-), i.e. base-tool0_controller_floatingjoint, and not with underscore (), i.e. base_tool0_controller_floating_joint. I tried with hyphen but I got an error that it cannot be parsed (wrong symbol) in moveit.

cthorey commented 7 years ago

Do you still observe the random moves punctually or after a certain time, it appears totally random ?

SrdjanStevanetic commented 7 years ago

Hi, the robot randomly jumps to arbitrary positions. I see it in rviz, and also when i run: rostopic echo /joint_states I get totally random values including very high values, all 0, etc. In my rqt graph i see only ur_driver publishing to joint_states. I am wondering why it publishes these states? Do you have any idea? Many thanks.

gavanderhoorn commented 7 years ago

I am wondering why it publishes these states?

This is most likely a desync in the real-time interface deserialiser. That would cause it to interprete the wrong byte sequences for certain fields.

As there are no sync-bytes in the protocol UR uses, it's not trivial to resync after that.

gavanderhoorn commented 7 years ago

@cthorey wrote:

Do you still observe the random moves punctually or after a certain time, it appears totally random ?

just to make this absolutely clear: the robot is not moving physically, these are just visualisation 'jumps' that you are describing, correct?

cthorey commented 7 years ago

@gavanderhoorn correct. @SrdjanStevanetic sounds like exactly what I was facing before changing the ur_realtime_communication.cpp file. Not sure why it is not working for you...

SrdjanStevanetic commented 7 years ago

@cthorey I have tested your code from: https://github.com/cthorey/ur_modern_driver/commit/12796faadbaebc6edc78b2ed5ff21b8d2d2c8fea, and also compared it to the current official version of the driver in the master branch (2 files are changed with regard to that: ur_ros_wrapper.cpp and ur_real_communication.cpp, package.xml) and tested with just changing the given 2 files. In both cases I got jumps of the robot. But to be more precise, jumps happen more often after I execute one move using Moveit! I also cannot succeed in moving the robot smoothly from Moveit! The moves are not continual. Thanks a lot!!!

cthorey commented 7 years ago

So before you execute any move with moveit!, things looks stable? What is the last link of your movegroup chain (you can see it in ur5_moveit_config/config/ur5.srdf) ? Are you using the tool0_controller published by the ur_modern_driver ?

SrdjanStevanetic commented 7 years ago

Btw guys, I am NOT using ros_control but ur3bring... launcher.

SrdjanStevanetic commented 7 years ago

I am not using tool0_controller. My last link is ee_link, from the ur3.srdf:

< group name="manipulator" > < chain base_link="base_link" tip_link="ee_link" > < group >

Before doing any move with moveit! the robot also jumps but rearly. For example I caught the following joint_states that somehow got published (the real robot position starts with 3.5, in the code below I tried to catch the positions that differ from the one of a real robot): user@CPPU3:~$ rostopic echo /ur3/joint_states | grep position | grep -v "position: [3.5" position: [-2.000000478931106, 2.6815622281295424e+154, -5.349025035e-315, 5.299808824e-315, 0.0, 0.0] position: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0] position: [0.24975182264220644, 0.0009210309320050001, 0.0, -8.865937695100947e-17, 5.538842268455751, 7.089115591447643]

SrdjanStevanetic commented 7 years ago

Do I maybe need to somehow synchronize times on the UR machine and the VM where I run my software. Can that be a problem?

ThomasTimm commented 7 years ago

As I have mentioned in several issues; your shouldn't use a virtual machine for anything requiring near-realtime network control. If your host OS (in which you are running your VM) decides to allocate your network interface for a few hundred milliseconds for something else than your VM, control will crash. This is also what you see here. Use a dedicated linux machine (even a cheap raspberry pi is more than sufficient) for running the ur_modern_driver.

I'll suggest closing this issue, as it is not a problem with the driver, but a problem with NIC priority between the VM and host OS.

SrdjanStevanetic commented 7 years ago

@ThomasTimm Thanks a lot! Do you maybe have any idea why the movements of the real robot run from the moveit! are uncontinuous (i.e. jerky...)? Is it related to the same issue? Many thanks!

ThomasTimm commented 7 years ago

Most likely, if you by uncontinous means that it sort of stutters. The driver streams new target pose each 8ms. Imagine if your host OS blocks communication for just 50ms (bcs of OS update or the like), then for 6 cycles the robot doesn't recieve a new target and thus have to decelerate rapidly before it then have to accelerate when communication is reestablished.

SrdjanStevanetic commented 7 years ago

ok. Is it maybe possible to ease these real time requirements so that the communication is a bit more stable? For example, what would happen if I change the the publish rate of joint_state_controller in the config/ur3_controllers.yaml file to 50, so that it sends the data each 20ms?

gavanderhoorn commented 7 years ago

The joint_state_controller does not send any data, and IIRC correctly, you're not using ros_control (https://github.com/ThomasTimm/ur_modern_driver/issues/128#issuecomment-335261290), so that file is not involved.

gavanderhoorn commented 7 years ago

Also: could I please ask you to not 'hijack' this issue? Jerky motion execution is probably not connected to the issue described by the OP (incorrect joint states being published).

SrdjanStevanetic commented 7 years ago

Ok, maybe it is related to this issue directly that is why we discuss it here.

gavanderhoorn commented 7 years ago

If you're not using ros_control, there is no loop, and the current joint states are not (directly) used in trajectory execution. It's open-loop.

I think the hypothesis posted by @ThomasTimm (missing setpoints between the VM and the robot) is a good one. See #56 for an older issue about what I believe could be the exact problem you describe.

It would help if you can try to run things natively (ie: not in a VM). If you then still observe the same issue, we can try to figure out what is going on.

SrdjanStevanetic commented 7 years ago

@gavanderhoorn Thanks for the info. I just wonder if there is an option for improving the communication as I mentioned above so that it is a bit more stable? If yes, where can it be configured or changed in the code? Many thanks!

happygaoxiao commented 7 years ago

Thanks a lot. I have installed the ursim 3.4.4-97 directly in ubuntu14.04 at my laptop today. The problem haven't happened again. Everything works fine! But I used to use the VMplayer in my Desktop PC without this problem.(there was something wrong about the old ursim directly using in ubuntu) . Anyway, problem have been solved. Thanks.

ThomasTimm commented 7 years ago

@SrdjanStevanetic Unless you can instruct your host OS prioritize the VM's network connection, there isn't... Either way, it is way outside the scope of the driver.

SrdjanStevanetic commented 7 years ago

Dear Thomas and all, I have done some additional testing and I am still facing the problem I explained above. Let me explain in a bit more detail again what is going on. Setup: We use 2 robots (KUKA and UR3) together with some other IoT devices like conveyor, Tablet, etc. We have a windows PC with VM-Ware server hosting some VMs (e.g. for controlling KUKA). That windows PC is connected over a gigabit router and then a switch to the real robot controllers from KUKA and UR3. Beside the windows PC I have installed the ur modern driver in a separate linux 14.04 machine and connected it to the switch using a cable connection. Since we discussed above that running the ur driver in a VM might cause a problem I installed it in a separate linux machine. In a scenario that we use, the KUKA robot transports some object to the ur3 robot which should do screwing. During the communication with the UR3, I got some messages that have strange length (like 10736546524, 0, -10736546356, etc.) and their content is not the right one, meaning that they are not valid. I am printing these messages in the robot_state_RT.cpp file in the part where the messages are unpacked. For example:

Message length: 1060.000000 Time: 3871959.560000 Actual joint positions: -3.203949,-1.570904,-1.570652,-1.570832,1.570674,0.571769,0.571769 Protocol version: 3.300000

Message length: 0.000000 Protocol version: 3.300000

Message length: 1073310141.000000 Protocol version: 3.300000

Message length: 1081057280.000000 Protocol version: 3.300000

The number of wrong messages in the test that we did is 2237 out of totally 93991 messages. In another test is was 776 out of 88732. So these bad messages are filtered out in the part that checks the message length in the robot_state_RT.cpp file. The order in which bad messages appear is random and not cyclic.

So, do you have any idea what can be the problem we are facing with? Btw, I also did tests with only the UR3 robot, without another robot and devices and the same problem happens. Could it be that some packets are lost in the communication among the real ur3 and the linux machine where the driver is running? Since I checked in the unpack(...) function in the robot_state_RT.cpp file that the messages are wrong, it means that those wrong messages are also sent from the real robot, right? Many thanks.

Regards, Srdjan

WwYyFan commented 6 years ago

Hello, I also encountered this bug. After the modification by the method you gave, the problem still exists. Finally, I found that it may be caused by the instability of the wireless connection. When I use the network cable to connect the robot, this bug will not appear.

gavanderhoorn commented 6 years ago

Please refrain from using a wireless connection. It's just not a sane thing to do when controlling an industrial robot remotely.

WwYyFan commented 6 years ago

@gavanderhoorn Thanks for your advice

raequin commented 6 years ago

This issue occurred with me, too, but it seems to have been a network problem that is hopefully resolved now. I'm commenting to see what you think about this quote

After the query is because the connection to the real time port, the tf information is lost, and after the connection is re-established, the end pose information is randomly released, which seriously affects the calibration process. Solution: Modify ur_communication.cpp and ur_realtime_communication.cpp in ur_modern_driver to clear the buffer every time you reconnect (Reconnect), etc., mainly add the following: bzero(buf, 2048 ) ; FD_ZERO(&readfds) ; FDSET(sockfd, &readfds) ;

ozilxu commented 2 years ago

This issue occurred with me, too, but it seems to have been a network problem that is hopefully resolved now. I'm commenting to see what you think about this quote

After the query is because the connection to the real time port, the tf information is lost, and after the connection is re-established, the end pose information is randomly released, which seriously affects the calibration process. Solution: Modify ur_communication.cpp and ur_realtime_communication.cpp in ur_modern_driver to clear the buffer every time you reconnect (Reconnect), etc., mainly add the following: bzero(buf, 2048 ) ; FD_ZERO(&readfds) ; FDSET(sockfd, &readfds) ;

@raequin Hi, may I ask which function of the .cpp file should add these 3 lines code? Because I have seen 3 line codes in the original file.

gavanderhoorn commented 2 years ago

@ozilxu: please confirm you have a CB1 or CB2 Universal Robot controller.

If you don't, you should be using ur_robot_driver, not ur_modern_driver.

ozilxu commented 2 years ago

@gavanderhoorn Hi, I am using the CB3 controller with Ur5 robot. So should I use ur_robot_driver instead of ur_modern_driver? Besides, will the virtual machine interfere with the connection between the robot and the PC? Thanks a lot!

gavanderhoorn commented 2 years ago

You should not be using ur_modern_driver with a CB3 controller.

Please ask your questions on the ur_robot_driver issue tracker / discussion forum instead.

ozilxu commented 2 years ago

@gavanderhoorn Hi, my URsoftware version is 3.1.0.0.76181. In this case, I guess I have to use UR modern driver only. So UR ROS driver cannot be deployed for my UR5. Is that right? Thanks a lot!

gavanderhoorn commented 2 years ago

No, I would suggest you update your CB3 controller.

You really do not want to use this deprecated driver.