ros-industrial / ur_modern_driver

(deprecated) ROS 1 driver for CB1 and CB2 controllers with UR5 or UR10 robots from Universal Robots
Apache License 2.0
301 stars 341 forks source link

"Low Bandwidth Trajectory Follower" cannot connect to robot #336

Closed xarthurx closed 4 years ago

xarthurx commented 5 years ago

I'm using a UR5 robot for Cartesian planning under ROS+MoveIt and experienced a time "catch-up" issue:

when using the UR5 to do Cartesian planning, the robots sometimes execute behind the planned time and will miss some of the necessary points in order to "catch-up".

So I tried to use the Low Bandwidth Trajectory Follower from the readme.

However, when sending several paths (each contains a series of frames to be planned by moveit and send to the robot) together, the robot only executes 1-2 of them and then stop, with information shown below:

UR panel: (A lot of them when using lowbandwidth, none when not) image

Ros: (Guess this is the reason why the robot stop) image

Any suggestions to fix?

gavanderhoorn commented 4 years ago

However, when sending several paths (each contains a series of frames to be planned by moveit and send to the robot) together

it's unclear to me what you're doing here exactly.

How do you "send[ing] several paths"? And to what/where/who?

Trajectory replacement is not supported.

xarthurx commented 4 years ago

Sorry for the confusion.

I'm sending a series of trajectory paths for the robot to follow the Cartesian trajectory, one after another.

Not replacement, but just one after another. In the normal mode, these operations are fine, but when using low_bandwidth, only the first one will be followed and then the robot will stop.

gavanderhoorn commented 4 years ago

Not replacement, but just one after another.

Ok. Thanks for the clarification.

In the normal mode, these operations are fine, but when using low_bandwidth, only the first one will be followed and then the robot will stop.

Does the action server indicate that trajectory execution was complete? Or does it hang, waiting for the motion to complete?

xarthurx commented 4 years ago

It doesn't hang. I can continue to do other planning jobs /actions.

gavanderhoorn commented 4 years ago

I believe this is going to be difficult to diagnose unless we have a way to reproduce it.

Could you provide a bit more information about how you have things setup, which versions of components, where are you submitting the trajectories (to MoveIt's execute trajectory action? To the driver directly? Somewhere else?) and other pertinent details.

gavanderhoorn commented 4 years ago

If this is still a problem and you'd like to diagnose this it would be good if you could provide the information requested in my previous comment.

xarthurx commented 4 years ago

I'm not using the low_bandwidth currently but can provide more info, if you can keep this issue open. Just a little busy recently.

On Wed, 4 Sep 2019 at 11:22, G.A. vd. Hoorn notifications@github.com wrote:

If this is still a problem and you'd like to diagnose this it would be good if you could provide the information requested in my previous comment.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ros-industrial/ur_modern_driver/issues/336?email_source=notifications&email_token=AAOVGVUV5DJKUZLTUVLE5TDQH546BA5CNFSM4IENVJ2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD526D6Q#issuecomment-527819258, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOVGVWJROHKKOVM26ZPWBTQH546BANCNFSM4IENVJ2A .

-- MA Zhao

"SAVE PAPER - PLEASE THINK BEFORE YOU PRINT!"

xarthurx commented 4 years ago

Hi, coming back to this issue.

I'm using compas_fab to send traj from Windows. Previously ROS was running on a virtual machine and experiencing many issues including this issue as well as some other issues like robot lagging and skipping some of the points during execution of a trajectory.

We recently move ROS to a real separate computer, and things seem to be fine now, either with the lowbandwidth or not.

If I experience further issues, I'll keep posting here. But it seems there's no more problem at the moment. Sorry for the inconvenience.

gavanderhoorn commented 4 years ago

It's unfortunate that the problems are now no longer there.

Performance in VMs has always been difficult, and VMs on Windows even more so.

The network infrastructure in VMs is sometimes not up to the task.

Just to learn something from this: which VM software were you using? VirtualBox, VMWare something else?

gavanderhoorn commented 4 years ago

As the OP has reported that he is now using a different system configuration which does not exhibit the problems initially reported, I'm going to close this.

Please feel free to keep commenting on the issue of course.

xarthurx commented 4 years ago

@gavanderhoorn I'm using licenced VMWare Workstation Pro. The main reason that I switched to a separate linux is the skipping issue when the robot tries to catch up and skip some of the points along the trajectory.

gavanderhoorn commented 4 years ago

It would be interesting to see whether increasing the priority of the VMWare process would help with the catch-up behaviour.

The lbtf not being able to connect is something else, but probably no longer a priority for you.

xarthurx commented 4 years ago

I can try sometime if I have time for the process priority. FYI, ROS on WSL is also experiencing the same issue.

On Thu, 12 Sep 2019 at 11:41, G.A. vd. Hoorn notifications@github.com wrote:

It would be interesting to see whether increasing the priority of the VMWare process would help with the catch-up behaviour.

The lbtf not being able to connect is something else, but probably no longer a priority for you.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ros-industrial/ur_modern_driver/issues/336?email_source=notifications&email_token=AAOVGVWOLEN6FUQKR3MPBI3QJIFDDA5CNFSM4IENVJ2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6RJMMQ#issuecomment-530748978, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOVGVWSKJSXDKJNIZMUWKTQJIFDDANCNFSM4IENVJ2A .

-- MA Zhao

"SAVE PAPER - PLEASE THINK BEFORE YOU PRINT!"

gavanderhoorn commented 4 years ago

FYI, ROS on WSL is also experiencing the same issue.

I'm not surprised.

WSL essentially runs Linux user-space processes as a task parallel to the regular Windows kernel.

gonzalocasas commented 4 years ago

As a side note, WSL 2 might improve on this area significantly (it is a real Linux kernel, not an emulation layer).

Perhaps stupid question but would it be theoretically possible to reduce the control rate? (it's 500Hz, right?) Or does that fundamentally screw up the control algorithm beyond repair? 😊

xarthurx commented 4 years ago

I'm not sure if WSL2 can be better than a VM on this issue, don't know where the bottleneck is. But yes, reduce the control rate might be a better solution if possible.

On Thu, 12 Sep 2019 at 13:34, Gonzalo Casas notifications@github.com wrote:

As a side note, WSL 2 might improve on this area significantly (it is a real Linux kernel, not an emulation layer).

Perhaps stupid question but would it be theoretically possible to reduce the control rate? (it's 500Hz, right?) Or does that fundamentally screw up the control algorithm beyond repair? 😊

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ros-industrial/ur_modern_driver/issues/336?email_source=notifications&email_token=AAOVGVXBA37OCVOSB3MZNV3QJISODA5CNFSM4IENVJ2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6RSFDQ#issuecomment-530784910, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOVGVR5LZK347SLIRR6C4DQJISODANCNFSM4IENVJ2A .

-- MA Zhao

"SAVE PAPER - PLEASE THINK BEFORE YOU PRINT!"

gavanderhoorn commented 4 years ago

As a side note, WSL 2 might improve on this area significantly (it is a real Linux kernel, not an emulation layer).

IIRC, WSL2 will run a full Linux kernel in a Hypervisor+VM setup. That may improve performance, but it's still going to be seen whether it'll be sufficient.

Perhaps stupid question but would it be theoretically possible to reduce the control rate? (it's 500Hz, right?) Or does that fundamentally screw up the control algorithm beyond repair?

No, it can be run at a lower rate (see @ThomasTimm's report), but it will reduce motion performance for dynamic motions.

gonzalocasas commented 4 years ago

No, it can be run at a lower rate (see @ThomasTimm's report)

@gavanderhoorn and what would that entail, roughly? I assume there's no parameter ur_control_rate_in_hz somewhere lying around, so, would it mean tweaking the code (possibly in a multitude of places) and testing if it still behaves?

gavanderhoorn commented 4 years ago

You would have to change these parameters:

https://github.com/ros-industrial/ur_modern_driver/blob/fd2e38af745b9d471a4f9532be5f39e5d95fe405/launch/ur10_bringup.launch#L20-L25

and make the configuration for ros_control match.

Note: that's all for the non-lbtf implementation though. The LBTF does not use those iirc.

It's non-trivial to tweak those values to something other than what they are now though, and I wouldn't be able to provide assistance (not because I wouldn't want to, but because it's been too long since I've done anything with those values).

gonzalocasas commented 4 years ago

Understood! Thanks!

omar-enein-nist commented 2 years ago

It's unfortunate that the problems are now no longer there.

Performance in VMs has always been difficult, and VMs on Windows even more so.

The network infrastructure in VMs is sometimes not up to the task.

Just to learn something from this: which VM software were you using? VirtualBox, VMWare something else?

Hello,

I recognize that this thread has been inactive for some time and that ur_modern_driver has since been deprecated, however I have encountered a similar error in using the Low Bandwidth Trajectory Follower, and believe that I have uncovered more information regarding this specific problem. I am currently using the latest kinetic-devel branch of ur_modern_driver with a UR5 CB3 robot through ROS Noetic and using WSL1 (I also had to communicate with the UR5 over Wi-Fi). Similar to the original poster, I also experienced "catch up" issues with Cartesian planning using the default trajectory follower and subsequently attempted to enable the Low Bandwidth Trajectory Follower (LBTF). The first trajectory attempted by my program would execute correctly with the robot, however subsequent trajectories would fail with the same error messages shown in the ROS console output of the original poster (though I did not encounter the errors the original poster observed on the UR5 controller).

Adding additional error output inside the if statement starting at line 77 of server.cpp, I discovered that the error was caused by the client socket file descriptor (client_.getSocketFD()) still being valid when the accept() method was called to initialize a new socket. Generally, it seemed that each time a trajectory is executed, a new client TCP socket would be created, the trajectory would then be sent to the controller over the socket, and then the socket would be closed (this is in trajectoryThread() starting at line 280 of action_server.cpp). I also discovered that when the first trajectory was executed, and the stop() method was called inside trajectoryThread() (line 356 of action_server.cpp), the disconnectClient() method was not being called because the client socket state had already been set to "Disconnected".

I determined the problem was that, in the read() method starting at line 143 of tcp_socket.cpp, there is an if statement checking whether the call to recv() returns 0, which indicates the client socket had an orderly disconnect (https://man7.org/linux/man-pages/man2/recv.2.html). The client socket state is then updated to "Disconnected", but the file descriptor is not also reset to an invalid value. Therefore, to fix the problem on my end, I added an additional line below 154 of tcp_socket.cpp to set the file descriptor to -1 (i.e., "socketfd = -1"). I am not sure whether or not there is a better general fix, but I hope this helps!