working counter missmatch

robert-burger / libethercat

EtherCAT master library. This library is used to build a deterministic fieldbus network with EtherCAT components.

https://www.dlr.de/rm

Other

20 stars 5 forks source link

working counter missmatch #23

Open marcfir opened 2 weeks ago

marcfir commented 2 weeks ago

Hi Robert, I get a 2024-09-03T13:33:12.017Z ERROR [libethercat_rs] MASTER_RECV_PD_GROUP: group 0: working counter mismatch got 8, expected 9, slave_cnt 5, mismatch_cnt. Does this mean I have a wrong PD configuration and the slave ignores my cyclic process data?

Here is the full log: out_pd_wk.log

marcfir commented 1 week ago

We did some more debugging. We got the PDO mapping working correctly for the Sercos drive. But looking at a tcpdump trace (dump_2.zip), we found that some frames have a working counter mismatch. Looking into the frame shows that when a wkc of 8 occurs, the output data of the sercos drive is 0. So something goes wrong in the slave. The expected wkc for the slave itself is 3 because we have input and output data. Any ideas for the reason? Timing problem?

For now we ignore the wrong wkc in the pd callback (https://github.com/robert-burger/libethercat/commit/6b0af7bef297fc8923bf2ad27916efda1365ce0d).

robert-burger commented 1 week ago

i think the slave 4 is still not in OP state. in the log we can see that it reached OP but in the trace the master still tries to write OP to AL control (FPWR frames in between the usual cyclic frame). Additionally the AL status is read from slave 4 (fixed adr 0x3ec) which states still SAFEOP.

It may also be that something goes wrong setting to OP the first time because it took so long for that slave. i'll check that..

edit: is it possible to move the drive?

marcfir commented 5 days ago

We have now found the problem. libosal uses CLOCK_REALTIME on POSIX, but we used a different clock. I have not analyzed the timing values, but adapting https://github.com/robert-burger/libethercat/blob/9108db144f028f763203957c595d9b39bc04505e/src/ec.c#L1880 solves the problem. Now all drives are movable. Reading S-0-0029 had high increasing values before the dc time fix, but with the fix it stays low. So I think it was a timing issue. I can provide a PR to have function ec_send_distributed_clocks_sync_with_rtc(ec_t *pec, osal_uint64_t rtc) or similar.

I think the slaves not going to OP in the log is caused by not enabling intermediate circuit voltage. Our electrical system is designed in such a way that the voltage must be enabled manually and the drives check the voltage status.

marcfir commented 5 days ago

Just re-opened to keep track of it until PR is merged