ros-industrial / ros_canopen

CANopen driver framework for ROS (http://wiki.ros.org/ros_canopen)
GNU Lesser General Public License v3.0
336 stars 271 forks source link

canopen_master "TimeoutException" with Schneider LXM28A (CiA 402) and RevPi Connect #425

Closed riv-robot closed 3 years ago

riv-robot commented 3 years ago

I've got a blocker on canopen_master (CANopen communication). The current state is that all plumbing is done, both controller and CAN bus seems to be ok but the ros_canopen fails on launch. I have just one device, a Schneider LXM28A and my CAN master is a RevPi Connect with ConCAN module. Attached are the following:

}


- **[positioner_driver.yaml](https://github.com/additiveautomations/aa-canopen/blob/master/positioner_driver.yaml)** & **[EDS](https://github.com/additiveautomations/aa-canopen)** (config files)
    - Note I have mapped the drive operation modes (ID 1 only) and the other mappings that were recommended [here](http://wiki.ros.org/canopen_402?distro=noetic)
- Vector **CANeds** EDS check  
![Vector CANeds check](https://user-images.githubusercontent.com/44445100/115671857-da442480-a342-11eb-9160-ec7f3685b299.png)
gavanderhoorn commented 3 years ago

The link was more meant as an example of a potential issue, not specific to this case.

mathias-luedtke commented 3 years ago

Nothing else is running on the CAN bus other than the RevPi ConCAN module which acts as my hardware interface.

@robertjbush: And no other process that accesses the can0 (or whatever it is called in your system) interface?

riv-robot commented 3 years ago

@robertjbush: And no other process that accesses the can0 (or whatever it is called in your system) interface?

Nope, I ran find /proc/* -type f | grep can0$ | head and netstat -ap -i can0, sudo ss -tulpn. Output from sudo netstat -tulpn. image

mathias-luedtke commented 3 years ago

Nope, I ran find /proc/* -type f | grep can0$ | head and netstat -ap -i can0, sudo ss -tulpn. Output from sudo netstat -tulpn.

Does any of them show the canopen_motor_node process?

riv-robot commented 3 years ago

@ipa-mdl See below for output (it shows that the canopen_motor_node process does indeed access the interface).
I am still stuck with the "halt" error....

image image

mathias-luedtke commented 3 years ago

See below for output (it shows that the canopen_motor_node process does indeed access the interface).

This only lists the TCP ports, which are used by ROS, but not CAN. Perhaps we have to log all CAN messages processed and sent by canopen_motor_node..

I am still stuck with the "halt" error....

Have you tried another switching state?

riv-robot commented 3 years ago

Success! 🟢🟢🟢
It has initialised successfully, finally...
I decided to go back to one of the 30 eds files I had tried (one I had labelled _BEST.dcf from last weekend) and go back to a bare bones configuration (i.e. no heartbeat, not specifying any 'advanced' flags, even no switching state).

â›” Still not moving because moveit! can't recognise my ros controller but that's off topic.

I'd like to get to the bottom of this issue as I feel it's never good to leave errors in the background. If there's anything you need @ipa-mdl to be able to debug it further, just write it down here and I'll do it for you. The team doesn't have the expertise with C++ to debug further (not easy to find expert C++ engineers!) so I can't do much more until we hire someone.

mathias-luedtke commented 3 years ago

Great to hear!

If there's anything you need @ipa-mdl to be able to debug it further, just write it down here and I'll do it for you.

I might have some time this week end to design a test case for the SDO communication on ARM. Including a trace option. I will keep you posted!

gavanderhoorn commented 3 years ago

So what's the conclusion here (for now)?

@robertjbush: what was it that made it suddenly work?

riv-robot commented 3 years ago

@gavanderhoorn

Conclusion

mathias-luedtke commented 3 years ago

Root Cause: TBC, but most likely to be source code of canopen_master rather than external factors

I have added a test case (#428) and it passed on ARM64 as well. Please test this branch locally as well (catkin_make run_tests), without commenting out anything..

Corrective Action: in sdo.cpp comment out line 429

Commenting out the error detection is not a solution.

riv-robot commented 3 years ago

Commenting out the error detection is not a solution.

You can read Read FMEA definitions here (see books). Corrective Action is not a solution by definition.

Please test this branch locally as well (catkin_make run_tests), without commenting out anything..

Will do as soon as I get my motor moving.

mathias-luedtke commented 3 years ago

You can read Read FMEA definitions here (see books). Corrective Action is not a solution by definition.

Let me rephrase then: It is not a proper corrective action either. Instead mitigating the failure (or its effect), you just comment the singe line which would report it to the caller. This is like painting the error light black.

Will do as soon as I get my motor moving.

I truly understand that this is your priority, but if SDOs are not working properly this might affect the motion signals. Especially since we cannot rule out (yet) that there is some issue with memory alignment, which would be present for PDOs as well.

riv-robot commented 3 years ago

Ok understood, so:

  1. Installed the branch from #428.
  2. Error persists on normal operation.
  3. catkin_make run_tests for ros_canopen no errors.
  4. catkin_make_isolated --catkin-make-args run_tests does give a worrying error and failed before completion (in tf2):
    ros_test_failure_1

My plan of action is:

  1. Wipe RevPi completely
  2. Install the very first release (last week) of Buster for the RevPi
  3. Reinstall everything
  4. Try again

    As a general note, I wish I had these tests earlier or that I would have thought about doing them earlier because they help in debugging system installation and package installation. A note on the ros_canopen wiki page would help those with less experience, like myself.

riv-robot commented 3 years ago

💚 Updated the RevPi image to Buster and the issue disappeared. For those who own a RevPi (Connect, Core, Compact, Flat) and want to use ROS make sure you take advantage of last weeks much overdue release of Buster.

The test suite was very helpful in debugging, thank you @ipa-mdl, please add it to the package and wiki for future users. And likewise the catkin workspace advice @gavanderhoorn was very helpful in setting up my fresh install. Its way quicker and robust!

To finish on a negative note, publishing to the command topic /canopen/positioner_controller/command still just hangs and doesn't execute a position based move....I suppose its time to open a ROS answers thread as this one can be closed.