Closed floweisshardt closed 7 years ago
@ipa-mdl: any idea?
@ipa-uhr: FYI
Quickstop means that the chain halted because of a failure with one module (or as a result of a service call to halt).
message: Internal limit active
Module reached an internal (end) limit: if the hardware was aligned correctly then the (MTS) config is broken.
Not sure about the SDO errors, do you use the latest overlay?
Update: Both relate to velocity_demand_value
when updating the reference position we did not modify anything expect the baudrate and canID as it was reset by the MTS tool. Can I simply copy the config from the left arm?
Can you see which joint throws the internal limit? We only resetted module 1 (ID: 51)
here's another series of calls to init
test@cob4-5-b1:~$ rosservice call /arm_right/driver/init
success: False
message: /u/robot/git/care-o-bot/src/ros_canopen/canopen_master/src/sdo.cpp(429): Throw in function void canopen::SDOClient::transmitAndWait(const canopen::ObjectDict::Entry&, const canopen::String&, canopen::String*)
Dynamic exception type: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<canopen::TimeoutException> >
std::exception::what: SDO
[canopen::tag_objectdict_key*] = 1600sub1
test@cob4-5-b1:~$ rosservice call /arm_right/driver/init
success: False
message: /u/robot/git/care-o-bot/src/ros_canopen/canopen_master/src/sdo.cpp(429): Throw in function void canopen::SDOClient::transmitAndWait(const canopen::ObjectDict::Entry&, const canopen::String&, canopen::String*)
Dynamic exception type: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<canopen::TimeoutException> >
std::exception::what: SDO
[canopen::tag_objectdict_key*] = 6061sub0
test@cob4-5-b1:~$ rosservice call /arm_right/driver/init
success: False
message: /u/robot/git/care-o-bot/src/ros_canopen/canopen_master/src/sdo.cpp(429): Throw in function void canopen::SDOClient::transmitAndWait(const canopen::ObjectDict::Entry&, const canopen::String&, canopen::String*)
Dynamic exception type: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<canopen::TimeoutException> >
std::exception::what: SDO
[canopen::tag_objectdict_key*] = 607asub0
test@cob4-5-b1:~$ rosservice call /arm_right/driver/init
success: True
message: ''
and recover says:
test@cob4-5-b1:~$ rosservice call /arm_right/driver/recover
success: False
message: Could not set transition; Could not enable motor
test@cob4-5-b1:~$ rosservice call /arm_right/driver/recover
success: True
message: ''
Can I simply copy the config from the left arm?
I don't know, I have never configured the arms, ask @ipa-tif
Can you see which joint throws the internal limit?
Node 53. But it is just a single message which looks like corrupt data. I believe it did not stop because of the internal limit flag (it is ignored during initialization), but because of a sudden "mode switch" to 255/-1 (instead of 0)
Can you provide me a dump of a SDO error at initialization?
@ipa-mdl: it seems as I cannot lock it down to only one error message (quick stop, emergency stop, transition, enable motor, internal limit). also the SDO erros are quite numerous (607asub0, 6061sub0, 1600sub1, 606bsub0, 1a02sub0, ...). I think I need a good hint how to lock down the error or what module might have the wrong configuration or what module is broken...
candump with SDO error at init is here: https://gist.github.com/ipa-fmw/13ec36be17b7af4bebf7f9862bf394d3
Is this one single candump or multiple files? I cannot open the website properly..
is is only for can2 (arm_right) on cob4-5-t1
is is only for can2 (arm_right) on cob4-5-t1
I meant: did you upload multiple files for each failure or just one. I managed to download it as zip file and it is just one file.
this is just one file per gist link.
this is just one file per gist link.
I missed the second link ;)
it seems as I cannot lock it down to only one error message
Node 53 is sending an unexpected response. I am not sure which part of it forces the driver to quick-stop - either internal limit or the "mode switch". This might happen while the driver expects a response (in another thread) from another node, but was interrupted by the quick-stop. Do not hunt the SDO errors, they are just a symptom (=last error message, they do not accumulate).
It look like a bug in the schunk firmware, it looks like the actual position was sent as status word.
(1473066397.980450) can2 3B5 [8] 61 CD FE FF 00 00 00 00
(1473066397.981448) can2 1B5 [3] 61 CD FE
['(1473066397.980450)', 'can2', 'TPDO3', 53, 'actual_position', 'FFFECD61', 'actual_velocity', '00000000']
['(1473066397.981448)', 'can2', 'TPDO1', 53, 'status_word', 'CD61', 'op_mode_display', 'FE']
ok, so what do you recomend? Can we do anything on the ROS/Canopen level? or should I reset the firmware?
are you sure it is module 54 which causes all the trouble?
are you sure it is module 54 which causes all the trouble?
No! Node 53.
ok, so what do you recomend? Can we do anything on the ROS/Canopen level? or should I reset the firmware?
Ask Schunk if this is a known issue with the old firmware (or whatever version the arm is running)
I contacted Schunk and send the candumps to F. Friedrichs. But unfortunately their firewall prevents them from viewing github pages!
Hello, this is Dirk Osswald from SCHUNK (some people here are allowed to access github...)
I'm going to analyze your logfiles, but that will take some time (preparing the files for CANoe, finding the interesting parts,...). Right now I cannot relieably guess how long that will take, but I fear it will be more than an hour. In the meantime you could provide me with some general information:
@ipa-mdl: do you have any more CANOpen related infos for Dirk?
Readable PDO mapping: https://github.com/ipa320/canopen_test_utils/blob/indigo-devel/scripts/schunk_mapping.py
Readable candumps: https://gist.github.com/ipa-mdl/fa95d202dd67eca9a4d32a446109eda1
@SCHUNK-Osswald: please note we use as velocity scaling of 1/250 deg/s to fit into the 16 bit command
Just a quick query: I got 2 separate log files from Frank Friedrichs: arm_right_quick_stop and arm_right_quick_stop2
Whats the difference / story / symptomatic behind? Is one file the "ok-case" and the other the "failure" case?
I just saw that I get the following error while beeing connected through USB to the PRL in the MTS Software:
Modul 10 wird initalisiert
Modul 10 antwortet GET_CONFIG -> Modulinformation
-> Modultyp: PRL+ 100
-> Bestellnummer: 306922
-> Software Version: 0.63
-> Protokoll Version: 3
-> Platine Version: 6.11
-> 2013-07-04 09:53:54 PTAo
Modul 10 antwortet GET_CONFIG -> Konfiguration: Falsche Daten CRC!
Is this ok or might that be a hint for the error?
@SCHUNK-Osswald: the two files are related to the output shown in the original post of this ticket (links to gist). In both cases it was not possible to move the modules.
@SCHUNK-Osswald: just different tries, the SDO error are just consequential, the major problem is:
(1473066397.980450) can2 3B5 [8] 61 CD FE FF 00 00 00 00 # actual positon and velocity
(1473066397.981448) can2 1B5 [3] 61 CD FE # status word and operation mode
OK, thx - I will eventually have to read this longish thread then ;-)
Your question regarding the wrong Data CRC is most likely not a good hint at the error. But I will take that into cosideration anyway.
@ipa-fmw: You could test the arm without node 53, just to make sure
running without 53 doesn't solve the problem. here's the candump with one time "init" and one time "recover". Both calls succeeded but after approximately 1sec all modules report an error again
Could not read error error_register
if I randomly uncomment just one module, I cannot initialize the chain. If I randomly uncomment 3-4 modules (that means 2-4 are active) the chances are better to be able to initialize. Even though the chain is principally working, it is most often not longer than 10sec working without hearing the brakes switch in and recovering is needed.
already midnight in Korea, so I'll give up for today. Thanks to @ipa-mdl and @SCHUNK-Osswald to help tracking down this issue.
As this is not resolved, we unfortunatelly have to solve this remotely.
(https://gist.github.com/ipa-fmw/c6494f033767df99a19aa6822fef56a9)
running without 53 doesn't solve the problem
All nodes report EMCY 8250 "RPDO Timeout", and the candump confirms that no data was sent out. Did you see a high CPU/IO load from other ROS nodes?
All these candumps look a little bit suspicious as all incoming messages have the same timestamp. Not sure what's going on. I would test it on another PC with a different CAN dongle.
CPU load is <5% on all 4 cores. Arm left is running on same pc, same can card, same ROS-configuration and is working without problems.
I have some observations / further queries to make:
1 candumps do look suspicious to me as well:
(1473066364.289796) can2 3B3 [8] D9 D6 00 00 00 00 00 00 (1473066364.289796) can2 3B4 [8] E3 53 00 00 00 00 00 00 (1473066364.289796) can2 1B4 [3] 60 92 00 (1473066364.289797) can2 3B5 [8] 61 CD FE FF 00 00 00 00 (1473066364.298044) can2 080 0 can2 1B4 [3] 60 92 00 (1473066364.299549) can2 1B3 [3] 60 92 00 (1473066364.299549) can2 3B3 [8] D9 D6 00 00 00 00 00 00 (1473066364.299549) can2 3B4 [8] E3 53 00 00 00 00 00 00 (1473066364.299550) can2 3B5 [8] 61 CD FE FF 00 00 00 00 (1473066364.300527) can2 1B5 [3] 60 92 00 (1473066364.307671) can2 5B6 [8] 60 17 10 00 00 00 00 00 (1473066364.307773) can2 636 [8] 40 00 18 01 00 00 00 00 (1473066364.307912) can2 080 0 can2 5B6 [8] 43 00 18 01 B6 01 00 00 (1473066364.308899) can2 636 [8] 23 00 18 01 B6 01 00 80 (1473066364.309796) can2 1B3 [3] 60 92 00 (1473066364.309796) can2 3B3 [8] 1C DC D9 D6 00 00 00 00 (1473066364.309796) can2 3B4 [8] E3 53 00 00 00 00 00 00`
The parts marked in bold are the actual_position of the PRL100 module. This is at 55001 (0x0000d6d9) all the time, except for single spikes 3604601884 (0xd6d9dc1c). The hex bytes seem to indicate that the low-word of the bytes of the position got shifted into the high-word and the low word filled with random bytes / garbage.
2 The statusword 0x6041/0 and mode of operation display 0x6061/0 have weired outliers as well:
(1473066417.489351) can2 1B3 [3] 60 92 00 (1473066417.489474) can2 633 [8] 40 01 10 00 00 00 00 00 (1473066417.490497) can2 1B4 [3] 07 92 00 (1473066417.490497) can2 3B3 [8] D9 D6 00 00 00 00 00 00 (1473066417.490497) can2 3B4 [8] E3 53 00 00 00 00 00 00 (1473066417.490498) can2 3B5 [8] 61 CD FE FF 00 00 00 00 (1473066417.491496) can2 1B5 [3] 27 92 00 (1473066417.491496) can2 3B6 [8] E0 8D FE FF 00 00 00 00 (1473066417.491497) can2 1B6 [3] 07 82 00 (1473066417.491497) can2 3B7 [8] 33 15 FF FF 00 00 00 00 (1473066417.491497) can2 1B7 [3] 07 82 00 (1473066417.492476) can2 3B8 [8] 16 64 FF FF 00 00 00 00 (1473066417.492477) can2 1B8 [3] 07 82 00 (1473066417.492477) can2 3B9 [8] 78 EF 00 00 00 00 00 00 (1473066417.492477) can2 1B9 [3] 07 82 00 (1473066417.493350) can2 5B3 [8] 4F 01 10 00 00 00 00 00 (1473066417.493496) can2 634 [8] 40 01 10 00 00 00 00 00 (1473066417.494492) can2 5B4 [8] 4F 01 10 00 00 00 00 00 (1473066417.494612) can2 635 [8] 40 01 10 00 00 00 00 00 (1473066417.495496) can2 5B5 [8] 4F 01 10 00 00 00 00 00 (1473066417.495598) can2 636 [8] 40 01 10 00 00 00 00 00 (1473066417.496496) can2 5B6 [8] 4F 01 10 00 00 00 00 00 (1473066417.496611) can2 637 [8] 40 01 10 00 00 00 00 00 (1473066417.497495) can2 5B7 [8] 4F 01 10 00 00 00 00 00 (1473066417.497611) can2 638 [8] 40 01 10 00 00 00 00 00 (1473066417.498495) can2 5B8 [8] 4F 01 10 00 00 00 00 00 (1473066417.498611) can2 639 [8] 40 01 10 00 00 00 00 00 (1473066417.498847) can2 080 0 can2 5B9 [8] 4F 01 10 00 00 00 00 00 (1473066417.500496) can2 1B3 [3] 4F 01 10
The obviously wrong data from the TPD1 of node 51 in the last line is a copy of the SDO request (!) sent to all modules just before.
3 Where is the mode of operation set for the modules? I see that you get the current mode of operation via tpdo1 (mode of operation display 0x6061/0), but I cannot find the setting of the mode of operation 0x6061/0 (e.g. via SDO). Is this just not contained in the logs due to the error?
Where is the mode of operation set for the modules?
The driver starts in "no mode" state per default. The mode will be set later, but the driver did not get that far. (For newer firmware version the mode can be set beforehand)
I don't think that this behaviour results from a genuine bug in the firmware or the ROS driver. In addition more than one module if affected.
@ipa-fmw: did you touch the end resistor dip switches?
I just analyzed the log files a little more and there are numerous of these "outlier" data for all nodes. Data of Node 53 e.g. somtimes has the position bytes in the place of the velocity bytes while the position bytes are filled with garbage.
The attached image shows the reported positions over time of all nodes from the arm_right_quick_stop2 log as a 2D-plot. But actually I do not really trust this log data very much. Since if our firmware did produce such invalid CAN messages with such a high frequency it would not be useable in any way.
So at the moment I do not think that I can be of much help any more - unless of course you have further questions.
in another test today we figured out that
I guess we can close this as the arm was broken anyway and is currently being repaired...@ipa-fmw @ipa-nhg @ipa-tif
let's hope that repairing helps with the issue above. I'll reopen if it does not fix the issue.
after resetting the arm reference position with the Schunk MTS tool, we experience strange behaviour of the arm (not sure if it was there before too).
I sometimes can init, but then the modules show quick stop or emergency error. So it is impossible to move the arm. Recovering says "True" and in diagnostics you can very shortly see OK, but the quick stop or emergency stop is active again. see candump https://gist.github.com/ipa-fmw/d2355a5ad93af345ccf23b35637adb63
Most of the time I cannot init. Either it says:
or
see candump https://gist.github.com/ipa-fmw/13ec36be17b7af4bebf7f9862bf394d3
arm config is:
diagnostics screenshot