Closed riv-robert closed 3 years ago
You might want to decode these EMCY message (manual) and the [error frames] to (https://github.com/torvalds/linux/blob/master/include/uapi/linux/can/error.h) know why things fail.
I would guess that something is wrong with the PDO settings (especially 1800sub1
).
The long list of errors and warnings in CANeds supports this as well.
Please make sure that your default PDO settings (DefaultValue
) match the device configuration at boot-up and that your custom settings are specified in ParameterValue
. And not all devices support changing the PDOs at run-time.
You could just try canopen_chain_node
with the default setting and take it from there.
All in all it does not look like a bug in ros_canopen
, so I will close this issue.
If fixing the PDO issue does not help, you might get help with your device configuration on ROS answers.
Hi @ipa-mdl, I took on board your feedback and spent the previous 4 days fully focused on PDO mapping. Lots of hours, not much progress. The result is I still believe there is something wrong with ros_canopen_master
. I'll explain a bit more:
ParameterValue
so ros_canopen knows it needs to go through a remap initialisation using SDO's in the pre-operational state. However as soon as I use ParameterValue
(whether using CANeds, a text editor or dcf_overlay
) I get the error above (i.e. timeout in object key 1800sub1
).Note I have fixed all CiA 301 and 402 errors using CANeds. Please could you offer some information on how the remap initialisation works and why it is throwing the error shown above? I don't believe the candump is much use as I've been decoding it in wireshark but here it is anyway (note below I've used a heartbeat, but that was just to test for pre-operational state):
can0 000 [2] 82 01
can0 701 [1] 00
can0 081 [8] 00 00 00 00 00 00 00 00
can0 601 [8] 2B 17 10 00 64 00 00 00
can0 581 [8] 60 17 10 00 00 00 00 00
can0 601 [8] 01 00 00 01 00 00 00 00
can0 701 [1] 7F
can0 701 [1] 7F
can0 701 [1] 7F
can0 701 [1] 7F
can0 701 [1] 7F
can0 701 [1] 7F
can0 701 [1] 7F
can0 701 [1] 7F
can0 701 [1] 7F
can0 701 [1] 7F
can0 601 [8] 00 00 00 01 98 50 33 01
can0 601 [8] 2B 17 10 00 00 00 00 00
can0 581 [8] 60 17 10 00 00 00 00 00
can0 000 [2] 02 01
Please could you offer some information on how the remap initialisation
Roughly like this (see https://github.com/ros-industrial/ros_canopen/blob/melodic-devel/canopen_master/src/pdo.cpp#L86):
For each PDO (nr_of_tx_pdo, nr_of_tx_pdo in EDS):
why it is throwing the error shown above?
Did you decode the error frames? Wireshark has CANopen support as well.
I don't believe the candump is much use as I've been decoding it in wireshark
It would be more helpful to have the decoding as well ;) But as far as I can tell this snippet just configures the hearbeat (0x1017).
I get an error on the Schneider controller that says the PDO packet is too long (check PDO mapping)
How many bytes did you configure? How long is the PDO in wireshark/candump?
Ok thanks, for now I'll assume it's my end that has the bug and not this package. I've therefore transitioned to ROS answers here. If I find more conclusive evidence that this package has a bug I'll post here.
If I find more conclusive evidence that this package has a bug I'll post here.
Yes, there might be some cases, which are not handled properly yet..
For example sigle bit boolean cannot be mapped (https://github.com/ros-industrial/ros_canopen/issues/268, not an issue for you) and canopen_master
cannot deal with doubled PDO mappings (https://github.com/ros-industrial/ros_canopen/issues/183).
I have some more evidence now that Ive been able to use CANOpenNode to read objects on the Schneider motor controller. Ive confirmed 100% the problem lies with ros_canopen, whether that be compilation, implementation or source code. So to be able to debug further could you clarify a few things regarding my environment:
Is my implementation known to cause any problems and is it supported?
I have some more evidence now that Ive been able to use CANOpenNode to read objects on the Schneider motor controller
This does not prove that it is working with PDOs..
Ive confirmed 100% the problem lies with ros_canopen, whether that be compilation, implementation or source code.
or the EDS/DCF is still wrong mismatching..
m running my ros master on another machine in a docker and launching the canopen node via remote machine tags. The ros_canopen package is on both machines.
That's fine.
the canopen machine is a raspberry pi running noetic on stretch.
You mean Debian buster? It should work, but I haven't tested it under noetic (at all). I can just report that it works on Raspberry PI with melodic.
Again, please decode the error frames and EMCY messages:
[ERROR] [1619040861.136742926]: EMCY received: 81#10ffa10000001500 [ERROR] [1619040861.144943677]: EMCY received: 81#9170a10000001300 [ERROR] [1619040862.143727648]: Did not receive a response message [ERROR] [1619040862.145883505]: Received error frame: 20000004#0008000000006501 [ERROR] [1619040862.146420374]: Received error frame: 20000004#0020000000008501
This does not prove that it is working with PDO
I mentioned earlier that I can use the default PDO's on the device, I just can't remap them through ros_canopen but the manual and Schneider engineers are very clear that dynamic remapping is supported in pre-operational mode
or the EDS/DCF is still wrong mismatching...
I did go through a sample of Object Dictionaries using CANOpenNode and they did match my default values in EDS, but I can go through every single one if necessary. I assume the SDOClient
in sdo.cpp
is checking the EDS and real device match?
You mean Debian buster?
No because the RevPi Connect doesn't have Buster in production (beta still) so I got Noetic working on Stretch using Buster repos. Haven't had a problem with other packages. Turns out Buster got released 4 days ago so if all else fails I will wipe the pi and try again with Buster OS.
please decode the error frames and EMCY messages
I have decoded them and hundreds others but I can assure you at the moment there are no useful error messages except for the one from roscanopen regarding 1800sub1
. My latest progress is that when I set 1800sub1
to ro
access it actually gets through the error but looking at the source code that is expected because sdo.cpp
isn't set up to write. **I still don't know why it is going into sdo.cpp
and not doing the remap in pdo.cpp
as it isn't clear where sdo gets called.**
I still don't know why it is going into sdo.cpp and not doing the remap in pdo.cpp as it isn't clear where sdo_ gets called.
You have to use SDOs to change the COB-IDs and the mapping of the PDOs.
so I got Noetic working on Stretch using Buster repos
Might work (if you build everything from source), but this setup is not really supported..
or the EDS/DCF is still wrong mismatching...
As already commented on ROS answers, i am not sure if RTR PDO are working. As far as I know RTR frames should not be used anymore (?).
Thank you that clears up a few doubts I had but a search on different canopen sites doesn't make it clear what you mean by RTR other than an extra bit in the COB-ID. Are you saying my device may not be compatible or that I need to change the default value so ros_canopen doesn't through an error due to COB-ID having an extra bit?
I just looked it up again, this bit means "NO RTR", perhaps it this is not handled properly, but your candump does not show this part of the remapping. I still don't understand why the EMCY is coming and what it means.
Thank you so much for your help, I'm really running out of ideas and we have no budget left to change hardware...Here is the wireshark decoded with EMCY packet showing (taken 1 minute ago).
And the error I get from ros_canopen:
Here is the wireshark decoded with EMCY packet showing
This is the EMCY reset (all 0), I meant the other ones. The screenshots looks a little strange.. the communication should not be reset twice.. And SDO requests should wait for theirs response..
This is the EMCY reset (all 0), I meant the other ones.
The others are from almost a week ago and I have worked solidly (including weekend) on this problem to fix those earlier issues. In other words my work has come on a lot since then and those errors messages are now fixed.
the communication should not be reset twice..
Wireshark tells me the double message are "Sent by us" and the other is a "Broadcast". I could have filtered out the "Sent by us", I don't believe they are causing an issue as the raw candump doesn't include them, just the tcpdump to pcap.
those errors messages are now fixed.
Ah, great.
Wireshark tells me the double message are "Sent by us" and the other is a "Broadcast".
Makes sense. I wonder why the second SDO is a segment download.. Which object is it reading?
There is a way to start without PDOs (as a test):
This should spin up the ROS node and give you the object services. If it does not work, then it not related to the PDOs.
run canopen_chain_node
I'm just recompiling the pi and will do when finished. I haven't run that package on its own before so will look for some examples on ROS answers.
I wonder why the second SDO is a segment download.. Which object is it reading?
Attached below
I'm just recompiling the pi and will do when finished. I haven't run that package on its own before so will look for some examples on ROS answers.
works pretty much like canopen_motor_node (which is a subclass of canopen_chain_node), You can use the same config etc.
Attached below
Look like object 1801sub0
I cannot see anything wrong there..
Normally 4 bytes can be transmitted in expedited mode (directly in the first response).
Can you upload the wireshark dump somewhere?
Wireshark dump is attached:
out.zip
My recompiling failed and it takes around 30 minutes on the pi so I'll report back regarding canopen_chain_node
test.
canopen_chain_node
tested. Note that I recompiled in order to comment out LINE 429
of sdo.cpp
. The interesting packet dump is attached and also a screenshot of the errors which are also interesting but I haven't made sense of it yet. I didn't modify my yaml which can be found here.
canopen_chain_node_debug.zip
Commenting out LINE 429
of sdo.cpp
and recompiling has enabled me to re-map the PDO's finally! This is good news because the device enters operational state. I still have this error:
"Mode switch timed out; Could not enter homing mode; Transition timeout"
And this very promising wiredump. canopen_motor_node_debug.zip
I'm not quite sure what is wrong now, but I believe commenting out that line of code in sdo.cpp
has helped enormously. Any help to get me over the line and get this motor enabled would be awesome!
Commenting out LINE 429 of sdo.cpp and recompiling has enabled me to re-map the PDO's finally!
I am not sure about this.. You basically knocked out the the check if the SDO download (ROS->Device) was successful. And the error message indicate that it did not work.
I am still on the first dump and try to figure out what is wrong with the download of 0x1017
(producer heartbeat time)
And the error message indicate that it did not work.
See below PDO's mapped correctly.
Key: TPDO1: (status word) | (modes of operation display) RPDO1: (control word) | (mode of operation)
Packets from canopen_motor_node_debug.pcap:
TPDO1: 43 31 | 00 ------> ready to switch on, voltage disabled RPDO1: 01 07 | 00 ------> halt TPDO1: 43 33 | 00 ------> switched on RPDO1: 01 0f | 00 ------> enable operation TPDO1: 43 37 | 00 ------> operation enabled, voltage disabled RPDO1: 01 0f | 06 ------> enable operation, halt and go into homing mode RPDO1: 01 0d | 00 ------> enable voltage, operation and halt TPDO1: 43 50 | 00 ------> switch on disabled TPDO1: 43 31 | 00 ------> ready to switch on, voltage disabled
@robertjbush: Could you please add some debug output around sdo.cpp#L305?
Especially offset
and total
.
I am still on the first dump and try to figure out what is wrong with the download of 0x1017 (producer heartbeat time)
These "Download segment request" are wrong and should not be present. I guess that offset and total should both be 2, but they are not.
These "Download segment request" are wrong and should be present.
You mean shouldn't be present?
Could you please add some debug output around sdo.cpp#L305?
Doing it now but compiling takes a while.... With all this recompiling I wish I'd implemented ROS on the RevPi inside a docker container...
You mean shouldn't be present?
Yes, sry. updated the post. After the first "Initial download response" total
and count
should be 2 (hearbeat time is 2 bytes).
Even in the new dump, this extra "Download segment request" is present (and shouldn't).
In addition there are some "Upload segment request", which should only be sent after UploadInitiateResponse
.
This looks like an ABI issue.
Could you please run rosrun socketcan_interface socketcan_dump can0
and compare it to the the wireshark (or candump) data? Then we can at least rule out any issue in the message passing.
Doing it now but compiling takes a while
It is only a change in one CPP file, this should not take too long (?).
It is only a change in one CPP file, this should not take too long (?).
Noetic for the pi (buster) is not yet on Jenkins build farm as pre-built binaries so I have to recompile everything each time I make a package update, hence why I'd be better off implementing a docker container....
This looks like an ABI issue.
I am new to CANopen and I'm not the C++ expert in our team so I'm not sure what you mean by ABI.
I'll do what you ask and hopefully after a full 7 days on this I can get this motor moving. Hopefully this will prove useful to others trying to use a similar environment to me.
EDIT: ABI seems to mean application binary interface.
It is only a change in one CPP file, this should not take too long (?).
Noetic for the pi (buster) is not yet on Jenkins build farm as pre-built binaries so I have to recompile everything each time I make a package update, hence why I'd be better off implementing a docker container....
this doesn't really make sense to me.
after you've built everything, you should have binaries and object files of your build.
Changing something in a single .cpp
should not require rebuilds of everything else.
Do you perhaps only have a single Catkin workspace, in which you have both the basic ROS build and the ros_canopen
packages, and are then (re)building it with catkin_make_isolated
?
Do you perhaps only have a single Catkin workspace, in which you have both the basic ROS build and the ros_canopen packages, and are then (re)building it with catkin_make_isolated?
I have a separate workspace for my custom packages which is a lot quicker to build but my base ROS installation was ros_comm and I was missing so many dependencies I decided to use the following to install it in the same workspace as my ROS installation:
cd ~{user}/ros_catkin_ws
sudo apt-get install -y python-rosdep python-rosinstall-generator python-wstool python-rosinstall build-essential cmake
sudo rosdep init
rosdep update
rosinstall_generator ros_canopen --rosdistro noetic --deps --wet-only --tar > noetic-ros_canopen-wet.rosinstall
wstool merge noetic-ros_canopen-wet.rosinstall -t src
wstool update -t src
rosdep install -y --from-paths src --ignore-src --rosdistro noetic -r --os=debian:buster
sudo src/catkin/bin/catkin_make_isolated --install -DCMAKE_BUILD_TYPE=Release --install-space /opt/ros/noetic -j1 -DPYTHON_EXECUTABLE=/usr/bin/python3
Please let me know if there's a method to have installed ros_canopen and all dependencies in my other workspace.
Your setup is a bit of a mess.
I'd suggest to:
ccache
in any case (don't forget to update your PATH
)To populate the overlay, use rosinstall_generator
while having source
d your main ROS install's setup.bash
and add the --exclude RPP
option (or --exclude-path
):
--exclude [EXCLUDE [EXCLUDE ...]]
Exclude a set of packages (also skips further
recursive dependencies). Use 'RPP' to specify all
packages available in the current environment.
--exclude-path [EXCLUDE_PATH [EXCLUDE_PATH ...]]
Exclude a set of catkin packages found recursively
under the given path (also skips further recursive
dependencies).
But disregarding all of this: building a single package to process a single .cpp
still doesn't take that long. Even catkin_make_isolated
supports the --from-pkg PKGNAME
or --only-pkg-with-deps ONLY_PKG_WITH_DEPS
options.
There's no need to rebuild everything.
Edit: and finally: if you're setting up a new system, why not figure out the configuration side of things on a more capable platform? Why go through the hassle of suffering slow turn-arounds with an RPi?
if you're setting up a new system, why not figure out the configuration side of things on a more capable platform? Why go through the hassle of suffering slow turn-arounds with an RPi?
I could purchase a USB to CAN adapter and use my PC yes. I suppose I didn't think it would be this laborious and we have the RevPi setup with CAN bus already. That's why I was remote launching the canopen node on the pi from my master (which is a Ubuntu PC or WSL one running ROS Noetic in a docker container). A docker container would be my preferred method, however I have to map through the can device to the container and that bit of 'learning' is on my backlog of tickets. I'm going to add your suggestion to that ticket and decide which setup to go for in the very near future (once I fix this issue!).
@ipa-mdl rosrun_socketcan_dump.txt wireshark_dump.zip
Now I've commented out sdo.cpp line 429
and got the mapping done correctly I feel there's something else missing (like transmission type=255
for example)?
I could purchase a USB to CAN adapter and use my PC yes
Or you could try https://github.com/linux-can/socketcand
A docker container would be my preferred method, however I have to map through the can device to the container and that bit of 'learning' is on my backlog of tickets.
My trick was to just use the host network, which includes access to SocketCAN as well.
Now I've commented out sdo.cpp line 429 and got the mapping done correctly I feel there's something else missing
This is still about the SDO, transmission type is available for PDOs. We have used transmission type 1 (send PDO on every sync) for all our robots.
rosrun_socketcan_dump.txt wireshark_dump.zip
Looks identical..
Yes they are identical! I've got no idea what was going on...
Almost got the motor moving, my latest error is:
message: "homing error before start; Homing failed; Transition timeout"
I used switching_state = 6
as the Schneider manual and CiA 402 standard gave me the impression this was the best state to switch from (as per my config file). I haven't changed transmission type
yet.
By the way thanks for the tips above.
I've got no idea what was going on...
I really don't know how this could happen..
At first I would like to get an idea how this show normally work. Please send the message manually and monitor the response (candump/wirsehark):
cansend can0 000#8201 # restart node 1
# wait for can0 701 [1] 00
cansend can0 601#2B17100000000000 # set heartbeat to 0
And then I would like to know, what happens with CANopenNode
1 w 0x1017 0 u16 0
At please add more logging to Line 308 (interface_->send(last_msg = DownloadSegmentRequest(client_id, false, buffer, offset));
) and the corresponding else case.
I don't see why we ros_canopen would send out the segment request.
message: "homing error before start; Homing failed; Transition timeout"
Do you need homing? Which mode are you using?
I used switching_state = 6 as the Schneider manual and CiA 402 standard gave me the impression this was the best state to switch from (as per my config file).
Why State 6/"Quick stop active"?
We are using Switched_On/4
if the default `Operation_Enable/5
does not work.
Why State 6/"Quick stop active"?
I just saw that the Schneider manual lists the states and numbers (starting from 1). However, these state numbers are not standardized (?) and as computer scientist I start from 0 ;)
before line 308: (interface_->send()):
ROSCANOPEN_ERROR("canopen_master", "before interface_->send() buffer = " << buffer <<", offset = " << offset);
after line 308 (interface_->send()):
ROSCANOPEN_ERROR("canopen_master", "after interface_->send() buffer = " << buffer <<", offset = " << offset);
after line 310 (done = true;):
ROSCANOPEN_ERROR("canopen_master", "Line 310, if offset <= total, then buffer = " << buffer <<", offset = " << offset << ", total = " << total);
after line 429 (commented section):
ROSCANOPEN_ERROR("canopen_master", "At muted error, offset: " << offset << " total: " << total);
The reasoning here is, as we are sharing addresses, the interface send could be changing the value of offset.
candump_test.zip canopenode_test.zip
EDIT: Updated results .png with debug messages (easier to read)
@ipa-mdl Homing deactivated. Profile Position mode (1) set.
Success: False
message: "Unknown exception"
Key: RPD01 : status word | mode of operation TRPDO01 : controlword | modes of operation display Final RPDO1 : 01 0d | 00 -------> enable operation and voltage, halt. Final TPDO1 : 43 50 | 01 -------> mode profile position. manufacturer specific error = "a halt request is active"
Why is ros_canopen sending a halt message? How can I remedy this?
Success: False message: "Unknown exception"
Any other warnings?
Why is ros_canopen sending a halt message?
The motion is halted until ros_control
takes over
How can I remedy this?
Not so easily.
I've added the following logging:
Thanks!
I cannot see any "before/after interface_->send()" line. So ros_canopen
does not send these "Download segment request" messages. Is there anything else running on that CAN bus?
The motion is halted until ros_control takes over
To be more precise: The motion will be halted until the mode got confirmed.
https://github.com/ros-industrial/ros_canopen/blob/melodic-devel/canopen_402/src/motor.cpp#L414
Not so easily.
You could comment that line out, but then the motor might move unexpectedly.
@ipa-mdl: are there any bit shifting, casting or other byte/bit-level operations in ros_canopen
?
I haven't checked, but seeing as @robertjbush is running this on an arm
arch, perhaps some endianess conflict is causing data to get mangled?
Although you did write:
I can just report that it works on Raspberry PI with melodic.
are there any bit shifting, casting or other byte/bit-level operations in ros_canopen?
Lots ;) Even some crude raw pointer casts..
perhaps some endianess conflict is causing data to get mangled?
Since there is no support for byte reordering implemented (yet?) ros_canopen
assumes a little-endian architecture, which is the default mode for ARM/Raspberry Pi.
CANopen uses little-endian encoding as well..
So far the candump looks correct at byte/bit-level, except for these extra message, which I cannot explain.
ros_canopen
assumes a little-endian architecture, which is the default mode for ARM/Raspberry Pi.
true. I forgot about that.
What about unaligned memory access?
Edit: I'd expect a crash sooner than corruption or the strange behaviour described in the linked doc, but still.
@gavanderhoorn I have a Kunbus RevPi which is little endian and a Kunbus ConCAN which bolts on through their conbridge interface. I couldn't find any information regarding the format across that interface.
@ipa-mdl Nothing else is running on the CAN bus other than the RevPi ConCAN module which acts as my hardware interface.
Success: False message: "Unknown exception"
Any other warnings?
None other than the persistent "Did not receive a response message" which I get at certain intervals.
You could comment that line out, but then the motor might move unexpectedly.
Definitely not desirable. I have managed to get ros_control to kick in when I was getting the PDO packet too long
error. The only change I've made is some defaults configurations and slowing the communication down.
What about unaligned memory access?
@gavanderhoorn: This documentation is about Keil. I am not sure, if this applies to all compilers. The code is using unaligned memory access to read/write the CAN frames, so it could be an issue. But I doubt it for now, because the CAN messages and even these logging message are correct.
I've got a blocker on canopen_master (CANopen communication). The current state is that all plumbing is done, both controller and CAN bus seems to be ok but the ros_canopen fails on launch. I have just one device, a Schneider LXM28A and my CAN master is a RevPi Connect with ConCAN module. Attached are the following:
sdo.cpp (canopen_master)
can::BufferedReader::ScopedEnabler enabler(reader_);
if(result){ interface_->send(last_msg = UploadInitiateRequest(clientid, entry)); }else{ interface->send(last_msg = DownloadInitiateRequest(client_id, entry, buffer, offset)); }
boost::this_thread::disable_interruption di; can::Frame msg;
while(!done){ if(!reader_.read(&msg,boost::chrono::seconds(1))) { abort(0x05040000); // SDO protocol timed out. ROSCANOPEN_ERROR("canopen_master", "Did not receive a response message"); break; } if(!processFrame(msg)){ ROSCANOPEN_ERROR("canopen_master", "Could not process message"); break; } } if(offset == 0 || offset != total){ THROW_WITH_KEY(TimeoutException("SDO"), ObjectDict::Key(*current_entry)); }
if(result) *result=buffer;
}