Local CAN bus monitor thread

PeterBowman commented 4 years ago

CanBusControlboard invokes ICanBusSharer methods on its wrapped CAN node subdevices on initial configuration and termination in order to perform initialization and finalization of CAN comms on them, respectively (ref1, ref2, ref3). That is, ICanBusSharer::initialize configures a few relevant objects via SDO, start the node via NMT and performs drive state transitions, whereas ICanBusSharer::initialize follows the opposite path.

I want to defer node initialization. CanBusControlboard::open should just open wrapped YARP subdevices and initialize CAN R/W threads. A new, master thread would monitor node state via available protocols, which currently is WIP at https://github.com/roboticslab-uc3m/yarp-devices/issues/223. I'm mostly pointing at NMT bootup and hearbeat protocols, which means initialization would be triggered on each node's availability on the CAN network, and there would also be room for a centralized (i.e. encompassing all opened CAN buses) SYNC protocol running at a fixed, preconfigured rate (see yarp::os::Timer).

This issue relates to https://github.com/roboticslab-uc3m/yarp-devices/issues/225 (which, in turn, depends on https://github.com/roboticslab-uc3m/yarp-devices/issues/160) in that such local master thread could respond to remote RPC request through the YARP network and forward them to CAN nodes, and viceversa.

PeterBowman commented 4 years ago

I wonder if this could be of any use:

Section 5.9.6, Object 2076h: Save current configuration

This object is used in order to enable saving the current configuration of the operating parameters of the drive. These parameters are the ones that are set when doing the setup of the drive. The purpose of this object is to be able to save the new values of these parameters in order to be re-initialized at subsequent system re-starts.

Writing any value in this object will trigger the save in the non-volatile EEPROM memory of the current drive operating parameters.

Or, more interestingly, objects 1010h ("Store parameters") and 1011h ("Restore parameters").

PeterBowman commented 4 years ago

Or, more interestingly, objects 1010h ("Store parameters") and 1011h ("Restore parameters").

Stored (1010h) and cleared (1011h):

1005h: COB-ID of the SYNC message
1006h: Communication cycle period
100Ch: Guard time
100Dh: Lifetime factor
1014h: COB-ID Emergency object
1017h: Producer heartbeat time
1400h-1403h: RPDO communication parameters
1600h-1603h: RPDO mapping parameters
1800h-1803h: TPDO communication parameters
1A00h-1A03h: TPDO mapping parameters
6060h: Modes of operation
6065h: Following error window
6066h: Following error time out
6067h: Position window
6068h: Position window time
607Ch: Home offset
6081h: Profile velocity
6083h: Profile acceleration
6098h: Homing method
6099h: Homing speeds
60FFh: Target velocity

Stored and never (?) cleared:

207Bh: Homing current threshold
207Ch: Homing current threshold time
6007h: Abort connection option code
605Ah: Quick stop option code
605Bh: Shutdown option code
605Ch: Shutdown option code
605Dh: Disable operation option code
605Eh: Fault reaction option code
607Ah: Target position
607Dh: Software position limit
607Eh: Polarity
6085h: Quick stop deceleration
609Ah: Homing acceleration

Bolded out: we set those in ICanBusSharer::initialize (see below). In addition, we set the following objects as well:

2081h: Set actual position
208Eh: Auxiliary Settings Register

https://github.com/roboticslab-uc3m/yarp-devices/blob/f54d3aa4a5b48e3d6dc142172b7e1d1b442016f8/libraries/YarpPlugins/TechnosoftIpos/ICanBusSharerImpl.cpp#L94-L113

PeterBowman commented 4 years ago

On the other hand, object 2076h, Save current configuration stores the following objects in the non-volatile memory:

2083h: Encoder Resolution for step loss protection
2084h: Stepper Resolution for step loss protection
2092h: User Variables (only on firmware F514E or newer)
6007h: Abort connection option code
607Eh: Polarity
607Ch: Home offset
6098h: Homing method
6099h: Homing speeds
609Ah: Homing acceleration
207Bh: Homing current threshold
207Ch: Homing current threshold time
6065h: Following error window
6066h: Following error time out
6067h: Position window
6068h: Position window time
607Bh: Position range limit
60F2h: Positioning option code
60F8h: Max slippage
2005h: Max slippage time out

PeterBowman commented 4 years ago

and there would also be room for a centralized (i.e. encompassing all opened CAN buses) SYNC protocol running at a fixed, preconfigured rate (see yarp::os::Timer).

See https://github.com/roboticslab-uc3m/yarp-devices/issues/232#issuecomment-572623913.

A new, master thread would monitor node state via available protocols, which currently is WIP at #223.

All node devices are CAN masters, just focused on a single CAN node (whereas CanBusControlboard spans every single node on that same CAN bus). Added TechnosoftIpos label accordingly. Also, I noticed that a bus off condition causes all write requests to stack up in the RX buffer until connection is established again. Hence, bus state should be monitored as well: CanBusPeak and CanBusHico devices are targeted.

PeterBowman commented 4 years ago

Hence, bus state should be monitored as well: CanBusPeak and CanBusHico devices are targeted.

I added some pointers regarding bus load at https://github.com/roboticslab-uc3m/yarp-devices/issues/231. Here, though, we need to query CAN bus status. Regarding the PEAK device, see PCAN user manual and the int pcanfd_get_state(int fd, struct pcanfd_state *pfds) entry point:

struct pcanfd_state {
    __u16 ver_major, ver_minor, ver_subminor;
    struct timeval tv_init; /* time the device was initialized */
    enum pcanfd_status bus_state; /* CAN bus state */
    __u32 device_id; /* device ID, ffffffff is unused */
    __u32 open_counter; /* open() counter */
    __u32 filters_counter; /* count of message filters */
    __u16 hw_type; /* PCAN device type */
    __u16 channel_number; /* channel number for the device */
    __u16 can_status; /* same as wCANStatus but NOT CLEARED */
    __u16 bus_load; /* bus load value, ffff if not given */
    __u32 tx_max_msgs; /* Tx fifo size in count of msgs */
    __u32 tx_pending_msgs; /* msgs waiting to be sent */
    __u32 rx_max_msgs; /* Rx fifo size in count of msgs */
    __u32 rx_pending_msgs; /* msgs waiting to be read */
    __u32 tx_frames_counter; /* Tx frames written on device */
    __u32 rx_frames_counter; /* Rx frames read on device */
    __u32 tx_error_counter; /* CAN Tx errors counter */
    __u32 rx_error_counter; /* CAN Rx errors counter */
    __u64 host_time_ns; /* host time in nanoseconds as it was */
    __u64 hw_time_ns; /* when hw_time_ns has been received */
};

Some of these can be mapped into a CanErrors structure queryable from the drive via ICanBusErrors interface. I'm particularly interested in the pcanfd_status enumeration (see pcanfd.h) and the can_status property (see pcan.h). The latter can hold a bit mask as follows :

enum pcanfd_status {
    PCANFD_UNKNOWN,

    /* flags & PCANFD_ERROR_BUS.
     * event id. is the state of the Bus */
    PCANFD_ERROR_ACTIVE,
    PCANFD_ERROR_WARNING,
    PCANFD_ERROR_PASSIVE,       /* receive only state */
    PCANFD_ERROR_BUSOFF,        /* switched off from the bus */

    /* flags & PCANFD_ERROR_CTRLR|PCANFD_ERROR_INTERNAL.
     * event id. is one of the following error: */
    PCANFD_RX_EMPTY,
    PCANFD_RX_OVERFLOW,
    PCANFD_TX_EMPTY,
    PCANFD_TX_OVERFLOW,

    PCANFD_BUS_ERROR,   /* deprecated, see: PCANFD_TYPE_ERROR_MSG */
    PCANFD_BUS_LOAD,

    PCANFD_STATUS_COUNT
};

/* can_status */
#define CAN_ERR_OK             0x0000  // no error
#define CAN_ERR_XMTFULL        0x0001  // transmit buffer full
#define CAN_ERR_OVERRUN        0x0002  // overrun in receive buffer
#define CAN_ERR_BUSLIGHT       0x0004  // bus error, errorcounter limit reached
#define CAN_ERR_BUSHEAVY       0x0008  // bus error, errorcounter limit reached
#define CAN_ERR_BUSOFF         0x0010  // bus error, 'bus off' state entered
#define CAN_ERR_QRCVEMPTY      0x0020  // receive queue is empty
#define CAN_ERR_QOVERRUN       0x0040  // receive queue overrun
#define CAN_ERR_QXMTFULL       0x0080  // transmit queue full
#define CAN_ERR_REGTEST        0x0100  // test of controller registers failed
#define CAN_ERR_NOVXD          0x0200  // Win95/98/ME only
#define CAN_ERR_RESOURCE       0x2000  // can't create resource
#define CAN_ERR_ILLPARAMTYPE   0x4000  // illegal parameter
#define CAN_ERR_ILLPARAMVAL    0x8000  // value out of range
#define CAN_ERRMASK_ILLHANDLE  0x1C00  // wrong handle, handle error

PeterBowman commented 4 years ago

From yarp::dev::CanErrors:

class YARP_dev_API yarp::dev::CanErrors
{
public:
    CanErrors() { /* ... */ }

    int txCanErrors;     //can device tx errors
    int rxCanErrors;     //can device rx errors
    bool busoff;         //bus off
    unsigned int rxCanFifoOvr;   // can device rx overflow
    unsigned int txCanFifoOvr;   // can device tx overflow
    unsigned int txBufferOvr;    // tx buffer overflow
    unsigned int rxBufferOvr;    // rx buffer overflow
};

I have no idea what the difference between "can device X overflow" and "X buffer overflow" could be. (cc @jgvictores ?)

See also CanBusMotionControl.cpp:

yDebug(" Can Errors --  Device Rx:%u, Device Tx:%u, RxOvf: %u, TxOvf: %u, BusOff: %d -- Driver Fifo Rx:%u, Driver Fifo Tx:%u\n", 
        errors.rxCanErrors,
        errors.txCanErrors,
        errors.rxCanFifoOvr,
        errors.txCanFifoOvr,
        errors.busoff,
        errors.rxBufferOvr,
        errors.txBufferOvr);

I guess: xCanFifoOvr -> physical CAN board, xBufferOvr -> e.g. chardev kernel-space module?

See https://github.com/robotology/QA/issues/381.

PeterBowman commented 4 years ago

Or, more interestingly, objects 1010h ("Store parameters") and 1011h ("Restore parameters").

Q: Let's take a closer look at object 607Ah, "Target position", which is stored by 1010h but never cleared by 1011h. The CANopen user manual states this object has no default value. If I change it and store its value once via 1010h, would it mean that there is no way to clear said object? This is especially concerning since a transition to operation enabled in position profile mode, if successful, might make the motor move to the target pointed by object 607Ah instead of awaiting my first (intended) command.

A: By default the "Object 607Ah: Target position" is 0. If you write a value in the object and save the parameters through "Object 1010h: Store parameters" that value will be "default" value on start-up. In other words when the drive is powered on, the "Target position" will be not 0, it will be the value that you save. If you want to change the value of the "Object 607Ah" just write a value in it. If you want to change the default, write a value and use the "Object 1010h".

PeterBowman commented 4 years ago

yarpmotorgui queries a few values upon GUI initialization and never asks for them again. If such queries fail (e.g. the bus is off when yarpmotorgui is started), those bogus values never get updated once CAN connection is successfully estalibshed. Relevant method calls (ref):

IControlLimits::getLimits
IControlLimits::getVelLimits
ICurrentControl::getCurrentRange

Joint limits are available on start via .ini file, I could provide them to callers if device is in VOCAB_CM_NOT_CONFIGURED mode.

PeterBowman commented 4 years ago

I'm going to forget the 0x1010 and 0x1011 objects. It takes one second to return an SDO confirm message from the latter, and CAN overruns are reported via EMCY for some reason.

For the record, the commit I used to test this was https://github.com/roboticslab-uc3m/yarp-devices/commit/005042abf7b8740ec3063d1f32e773a43de27cde (BTW thore static local variables are totally wrong).

Edit: I could assume this data storage persists across runs, pity I cannot ask the drive whether something has been stored in the non-volatile memory (so I can choose to either skip initial configuration or re-run the usual steps - and perhaps store this config (much) later).

Edit2: oh, well, just SDO any of the first bolded objects listed at https://github.com/roboticslab-uc3m/yarp-devices/issues/242#issuecomment-572331534 and compare their defaults. I'd use IRemoteVariables for interfacing (remember the SDO timeout must be substantially increased).

Edit3: not worth the trouble, we really want to depend solely on .ini files (not on our ability to recall previous manual tweaks of the internal memory), the EEPROM has a limited lifespan, etc.

PeterBowman commented 4 years ago

Ready at https://github.com/roboticslab-uc3m/yarp-devices/pull/243/commits/ca1aef8c3454ed662d92ae03864429640870c44e.

PeterBowman commented 4 years ago

Joint limits are available on start via .ini file, I could provide them to callers if device is in VOCAB_CM_NOT_CONFIGURED mode.

Every config option available via .ini is exposed through YARP interfaces if the CAN node is not accessible: https://github.com/roboticslab-uc3m/yarp-devices/commit/e2b4da844c9095e61bfd735c07cbcd89ba0aecdf.

roboticslab-uc3m / yarp-devices

Local CAN bus monitor thread #242