roboticslab-uc3m / yarp-devices

A place for YARP devices
https://robots.uc3m.es/yarp-devices/
9 stars 7 forks source link

New Mbed/CAN-based JR3 driver #263

Closed PeterBowman closed 11 months ago

PeterBowman commented 1 year ago

Our JR3 driver communicates with a massively big PCI board we were unable to plug into one of the computers aboard TEO (https://github.com/roboticslab-uc3m/teo-hardware-issues/issues/32). To this day, this board is connected to an external tower PC. Additionally, it introduces other issues, e.g. forced zeroing on power-up (https://github.com/roboticslab-uc3m/jr3pci-linux/issues/11).

Luckily, the role of the PCI board could be taken over by a cheap, compact and open-sourced Mbed board, such as the ones we are already using to send PWM commands to the grippers. A Master's Thesis by Javier Berrocal (Design and implementation of a data acquisition system for force/torque sensors) has proven that this works and is feasible to implement (edit: unfortunately, no code has been derived from this work since the methodology was incorrect; on the other hand, the described hardware connections were largely helpful). See also a similar work on a PIC controller by Alberto López Esteban: Diseño y desarrollo de un módulo de conexión a CANopen de un sensor comercial fuerza/par. There was also a preliminary work by Carlos de la Hoz Najarro that explored the capabilities of the PCI board in conjunction with MATLAB: Puesta en marcha del sensor fuerza/par JR3

So, what's the plan? @smcdiaz and I have agreed to re-use the existing Mbeds on the hands and add two more on the legs. Each Mbed will be connected via CAN bus to one of the PCAN cards. Note that we have two such cards, each one accepts up to four CAN channels, therefore one PCs (presumably "manipulation") will be devoted to controlling the iPOS and CUI nodes, and JR3 should therefore send data to the other PC (that would be "locomotion" according to the plan; see rationale in https://github.com/roboticslab-uc3m/teo-hardware-issues/issues/56). It is interesting to note that no other CAN nodes will be present on each bus other than one of the new (or repurposed) Mbed boards, thus ensuring we can use as much bandwidth for sensor data as possible.

As for the existing Mbed boards, they would have to handle two CAN channels/buses: one channel for reading PWM gripper commands, another channel for sending JR3 data. Those buses, as explained before, would connect to different PCAN cards: one managed by launchCanBus, the other managed by the new device we are discussing here. To keep things simple, I'd replicate this architecture on the legs, too, even though the gripper-related logic would remain unused (and the hardware unprepared for handling two CAN buses).

Discarded: it would have been possible to use a single Mbed for communicating with all four JR3 sensors, and then send their data through serial bus (USB). This alternative solution is not being considered in order to ease the design of the new electronics for TEO.

Our MBED model is LPC1768 (schematic) using an LQFP100/LPC1768FBD100 microcontroller (datasheet).

PeterBowman commented 1 year ago

Even though the NXP microcontroller offers two CAN channels, I'm afraid the MBED LPC1768 only pins to one of them, specifically through DIP29 and DIP30 (on the NXP: pins 80 & 81 correspondingly, i.e. CAN2-TD2 and CAN2-RD2), The CAN1 pins 56 & 57 are not mapped between the NXP and the MBED board.

PeterBowman commented 1 year ago

The MBED product search tool lists 6 boards with support for CAN communication: results. Interestingly, our board is not listed there. The DISCO boards have a single CAN channel. Now, the Nucleo boards look really promising:

See also: product portfolio, Nucleo-144 board family, user manual.

The user manual explicitly refers to a CAN1 channel, but the CAN2 entries that should be there according to the pinout drawings on the MBED site are simply missing. On the other hand, PeripheralPins.c does indeed provide mappings for both CAN1 and CAN2, but so does its counterpart for the LPC176x.

PeterBowman commented 1 year ago

The firmware/LacqueyFetch directory has been reviewed and migrated to https://github.com/roboticslab-uc3m/lacquey-fetch-firmware. It has been done so to easily import/clone this project into https://studio.keil.arm.com/, which has superseded the old Mercurial-based MBED repository and online compiler. The old repo is here.

In case the only dependency of this project ever disappears, here are its sources: Motor-f265e441bcd9.zip.

PeterBowman commented 1 year ago

The new JR3 firmware will be developed here: https://github.com/roboticslab-uc3m/jr3-mbed-firmware.

PeterBowman commented 1 year ago

Even though the NXP microcontroller offers two CAN channels, I'm afraid the MBED LPC1768 only pins to one of them, specifically through DIP29 and DIP30 (on the NXP: pins 80 & 81 correspondingly, i.e. CAN2-TD2 and CAN2-RD2), The CAN1 pins 56 & 57 are not mapped between the NXP and the MBED board.

I probably have misread this and we actually do have access to both CAN channels on the NXP. According to the LPC1768FBD100 datasheet (see link in issue description), section 7.2. Pin description, CAN1 is mapped to pins 57/56 and 46/47 (RD/TD), while CAN2 is mapped to 66/65 and 81/80. Now, the LPC1768 schematic (also linked in issue description) provides the following figure on page 3:

Screenshot 2023-03-17 at 01-28-07 mbed-005 1 sch - mbed-005 1 pdf

In short: pins 57/56 and 66/65 have no connection to the MBED pinout. On the other hand, pins 81/80 (CAN2) map to DIP30/29 (we already knew that), and 46/57 (CAN1) to DIP9/10 (I have overlooked this somehow?). The pinout diagram on https://os.mbed.com/platforms/mbed-LPC1768/ does not reflect this second channel anywhere, the product description neither (both suggest there is a single CAN channel). I realized this after revisiting https://os.mbed.com/handbook/CAN:

canexample

Sorry, @smcdiaz, I should have paid more attention. So it looks like a single MBED per TEO's hand is enough for our purposes (commanding the LacqueyFetch and reading from the JR3 sensors).

PeterBowman commented 1 year ago

By the way, Alberto helped me a lot with analyzing the DATA and DLCK signals coming from the serial port of the sensors. We can tell that the description in serialcomm.pdf seems accurate:

photo_2023-03-08_14-10-13 photo_2023-03-08_14-16-14 photo_2023-03-08_14-23-47

There are 20 bits per data frame, as expected, and the period is 500 ns (2 MHz). Note the waveforms are reversed in the two first pics due to the +/- channel assignment (CH1: DCLK-, CH2: DCLK+, CH3: DATA-).

Edit: sensor frames for channel 7 (calibration) are published at a 1 MHz rate for the data part, using an asymmetric clock pulses (25% low, 75% high). The address part, however, is published at 2 MHz like with any other channel. The order is kept, i.e. all eight channels are published sequentially (starting from channel 0), and there is an idle interval between frames, somewhat shorter than the length of a single regular frame.

WhatsApp Image 2023-07-01 at 11 39 01

smcdiaz commented 1 year ago

Good job Alberto y Bartek!!

PeterBowman commented 1 year ago

I'm trying to take advantage of the EventFlags API to combine a semaphore (sort of) with sharing data across threads, the InterruptIn API to attend to rising edges on the clock signal, and the DigitalIn API to retrieve the high-low state of the data pin on said edges.

For now, it looks like the data pin is always high for some reason. Also, I just got this error via serial port:

++ MbedOS Error Info ++

Error Status: 0x80020126 Code: 294 Module: 2

Error Message: CMSIS-RTOS error: ISR Queue overflow

Location: 0x3CCD

Error Value: 0x2

Current Thread: main Id: 0x100019BC Entry: 0x3831 StackSize: 0x1000 StackMem: 0x100003A8 SP: 0x10007EEC 

For more info, visit: https://mbed.com/s/error?error=0x80020126&tgt=LPC1768

-- MbedOS Error Info --

From https://armmbed.github.io/mbedos-error/?error=0x80020126&tgt=LPC1768:

This error originates from Kernel/RTOS layer. This is caused by ISR Queue overflow while inserting object.

Some Google results:

It seems that the continuous set() calls in my interrupt handler, which are not meant to signal anything to the main thread, but only aim to store the frame bits in a uint32, are putting a considerable strain on the OS (I added the dummy bit 21 as a semaphore to be used as the actual signal). Plan B: move away from EventFlags, embrace the Semaphore API instead, store intermediate bits in a thread-local variable, release the semaphore once the full frame is processed.

By the way, I tried #define OS_ISR_FIFO_QUEUE 30 to no avail.

Edit: actually, Thread::flags_set (available in ISR contexts) and ThisThread::flags_wait_all look like a better alternative to Semaphore.

PeterBowman commented 1 year ago

Even with bare metal profile (instructions) enabled, I'm unable to get more than 2 KHz frame send rates (should be 64 KHz or so). The FastIO library (mind this bug) also didn't help. The technique I'm pursuing here is called bit banging and requires squishing all available resources out of the CPU. Even though it runs at 96 MHz, it may not be able to handle even 400 KHz interrupts (ref).

But still i don't understand, how come a CPU with 100 MHz can handle 400 KHz interrupts. Is this because of the pipeline, memory access cycle, or the library?

Combination of two parts: First the general overhead of the abstraction layer of the library. It simply costs time to walk through the multiple layers, which are needed to have alot of flexibility as user. This is generally the case, those easy to use libraries are not suitable if you want very high performance out of it.

Next is that a general IO interrupt is used here. Advantage is that it can be used on most mbed pins. If however you use one of the dedicated GPIO interrupt pins and only that pin, then you can tie the user function directly to that interrupt and it will be alot faster. Also then you shouldn't expect you reach 50MHz or something similar, switching to and from interrupt context also costs time (registers need to be saved, pointers set to new values, etc).

Next steps: ignore the API, use registers.

Even moar links:

See also LPC17xx User Manual (source).

Source code regarding register handling:

A handy guide to interrupt handling on the LCP1768 (source): Resumen Interrupciones LPC1768.pdf.

Also useful, how to deal with printf in MBED:

Stuff related to overclocking:

PeterBowman commented 1 year ago

Some things I tried, just for the record:

The ISR attempts should be discarded as they don't comply with the required performance. Bit banging is the only way to go, but excessive instructions will have a negative impact resulting in bits (and therefore frames) being dropped.

I have prepared a Python script to decode a sequence of bits as dumped from the clock and data streams (jr3-dumps.zip). It finds the correspondence between both and correctly interprets data frames.

import os
import re
import sys

# https://stackoverflow.com/questions/1604464/twos-complement-in-python
def twos(val_str, bytes):
    b = int(val_str, 2).to_bytes(bytes, byteorder=sys.byteorder, signed=False)
    return int.from_bytes(b, byteorder=sys.byteorder, signed=True)

base = os.path.dirname(__file__)

with open(os.path.join(base, 'dump.txt'), 'r') as source, open(os.path.join(base, 'dump-parsed.txt'), 'w') as dest:
    clock, data = source.read().splitlines()

    assert len(clock) == len(data), 'size mismatch'
    assert re.match(r'^[01]+$', clock), 'illegal chars in clock stream'
    assert re.match(r'^[01]+$', data), 'illegal chars in data stream'

    for m in re.finditer(r'(0+(?:1{1,5}(?=0)|1|$)){20}', clock):
        frame = ''.join(data[m.start() + t.start()] for t in re.finditer(r'(?<=0)1', m.group(0)))

        if len(frame) == 20:
            dest.write(f'{frame} {int(frame[0:4], 2)} 0x{int(frame[4:], 2):04X} {twos(frame[4:], 2)}\n')

On the other hand, this script parses the output of candump, where three JR3 frames are consecutively encoded in a single 64-bit CAN data frame (using 60 bits):

import os
import re
import sys

if len(sys.argv) != 2:
    print("Usage: python decode.py <filename>")
    exit(1)

filename = sys.argv[1]
base = os.path.dirname(__file__)
source_path = os.path.join(base, filename)
dest_path = os.path.join(base, '-decoded'.join(os.path.splitext(filename)))

with open(source_path, 'r') as source, open(dest_path, 'w') as destination:
    for line in source:
        m = re.match(r'^can\d {2}\d{3} {3}\[8\] ((?: [0-9A-F]{2}){8})$', line.strip())
        raw = int(''.join(m.group(1).strip().split(' ')[::-1]), 16) # parse as hex

        for frame in (raw & 0x5FFFF, (raw >> 20) & 0x5FFFF, raw >> 40):
            destination.write(f'[{(frame & 0x70000) >> 16}] 0x{(frame & 0x0FFFF):04X}\n')

Another version of the above script, where four JR3 frames are encoded instead (the 16-bit data part occupies the 64-bit CAN payload, while the 4-bit channels are encoded in the 11-bit direction segment using delta encoding):

import os
import re
import sys

if len(sys.argv) != 2:
    print("Usage: python decode.py <filename>")
    exit(1)

filename = sys.argv[1]
base = os.path.dirname(__file__)
source_path = os.path.join(base, filename)
dest_path = os.path.join(base, '-decoded'.join(os.path.splitext(filename)))

def parse_channels(direction):
    channel1 = direction & 0x0007
    channel2 = channel1 + ((direction & 0x0018) >> 2)
    channel3 = channel2 + ((direction & 0x0060) >> 4)
    channel4 = channel3 + ((direction & 0x0180) >> 6)
    return channel1, channel2 % 7, channel3 % 7, channel4 % 7

with open(source_path, 'r') as source, open(dest_path, 'w') as destination:
    for line in source:
        m = re.match(r'^can\d {2}([0-9A-F]{3}) {3}\[8\] ((?: [0-9A-F]{2}){8})$', line.strip())
        direction = int(m.group(1), 16)
        raw = int(''.join(m.group(2).strip().split(' ')[::-1]), 16) # parse as hex

        for channel, data in zip(parse_channels(direction), (raw & 0xFFFF, (raw >> 16) & 0xFFFF, (raw >> 32) & 0xFFFF, raw >> 48)):
            destination.write(f'[{channel}] 0x{data:04X}\n')

Raw and decoded CAN frames (edit: ignore this, it missed a lot of JR3 frames): jr3-can.zip. The can-xxx.txt CAN data streams were captured using 60-bit payloads (three JR3 frames), the xxx denote the percentage of filtered messages (e.g.: 0.1 means that 9 of 10 JR3 frames were dismissed). The can-x4.txt files refer to 64-bit data payloads (four JR3 frames, with channel information encoded in the CAN direction segment). In both can-0.75 and can-x4 scenarios, the can bus load was in the range of 96-99%.

A bunch of conclusions:

Recap on what actually worked: intensive bit-banging is meant to be performed via a dummy while-loop such as while (pin.read()) {}. Even though it doesn't look harmful at first glance, introducing intermediate instructions for seemingly light stuff (e.g. storing state into temporary variables) draws CPU power from the actually important stuff (not missing a single bit from a 20-bit sensor frame). See https://github.com/roboticslab-uc3m/lacquey-fetch-firmware/commit/17f392d2051eedb274f12f021132987cb16a4968.

PeterBowman commented 1 year ago

I managed to capture the seventh raw channel, which corresponds to calibration data (LSB is the value, MSB is the address in EEPROM). I converted this stream into a CSV with the following script:

import csv

with open('calib-leftArm.txt', 'r') as f_in, open('calib-leftArm-8.csv', 'w', newline='') as f_out:
    lines = f_in.readlines()
    writer = csv.writer(f_out)

    for row in range(0, 32):
        writer.writerow([line[2:4] for line in lines[row * 8 : (row + 1) * 8]])

See calib.zip. The memory layout is detailed in this document; most importantly, it encodes the calibration matrix needed for the decoupling of raw force/torque data. I have also dumped the DSP memory contents (i.e. from the receiver) via https://github.com/roboticslab-uc3m/jr3pci-linux/commit/a3d86de2ce72ba6faff1bec092fc2d0f01176b8e: dsp-pretty.txt.

It turns out the calibration matrix and the operations performed on it combine floating- and fixed-point representations. Regarding the latter, the coefficients adhere to the (signed) Q1.15 format:

Incidentally, I have found a nice JR3 driver for Windows by Norberto Pires. The header file provides some insights into the DSP memory layout as well: jr3pci_soft_2005_V3.zip.

PeterBowman commented 1 year ago

So far I have determined, through inspection of JR3 frames and DSP memory values, that raw data is processed as follows:

  1. acquire raw JR3 frames
  2. decouple raw data using the calibration matrix
  3. remove offsets
  4. apply low-pass filter (if requested)
  5. apply full-scales to obtain physical units (N, Nm)

See calibration.ods (check previous comment for source of data).

Calibration data is structured as a 3-byte value: 8-bit exponent (in two's complement) and 16-bit mantissa in fixed-point signed Q1.15 representation (the first bit denotes the sign). It seems that the DSP implements the necessary logic to fast and easily perform arithmetics with these representations. The Mbed, however, adheres to the IEEE754 standard for floats, hence conversions are necessary on our side. Also, 32-bit variables are preferred due to the architecture of the ARM M3 chip.

My first attempt on converting between raw 24-bit floats and IEEE754 floats (ref):

float jr3FixedToIEEE754(uint16_t fixed)
{
    float value = 0.0f;

    for (int i = 0; i < 16; i++)
    {
        value += ((fixed & (1U << (15 - i))) >> (15 - i)) * powf(2, -i);

        if (i == 0)
        {
            // in signed Q1.15 format, the leftmost bit determines the sign
            value = -value;
        }
    }

    return value;
}

inline float jr3FloatToIEEE754(int8_t exponent, uint16_t mantissa)
{
    return jr3FixedToIEEE754(mantissa) * powf(2, exponent);
}

uint16_t jr3FixedFromIEEE754(float value)
{
    int16_t result = 0;
    float integer = 0;
    float absolute = fabsf(value);

    for (int i = 0; i < 15; i++)
    {
        modff((absolute * powf(2, 15 - i)), &integer);
        result |= (int)integer << i;
    }

    return value < 0.0f ? result * -1 : result;
}

(using C-style math functions to avoid implicit casting to double via C++ templates)

However, working with plain memory is much faster (~130 times and ~60 times, respectively; ref1, ref2):

float jr3FloatToIEEE754(int8_t exponent, uint16_t mantissa)
{
    uint32_t temp = 0;

    if (mantissa >> 15)
    {
        temp |= (((~mantissa & 0x3FFF) + 1U) << 9) | (1U << 31);
    }
    else
    {
        temp |= (mantissa & 0x3FFF) << 9;
    }

    temp |= (exponent + 126) << 23;

    float f;
    memcpy(&f, &temp, 4);
    return f;
}

inline float jr3FixedToIEEE754(uint16_t fixed)
{
    // beware of integral promotion! https://stackoverflow.com/a/30474166
    int8_t exponent = __CLZ((fixed >> 15 ? ~fixed : fixed) & 0x0000FFFF) - 17;
    return jr3FloatToIEEE754(-exponent, fixed << exponent);
}

uint16_t jr3FixedFromIEEE754(float f)
{
    uint32_t temp;
    memcpy(&temp, &f, 4);

    int8_t exponent = ((temp & 0x7F800000) >> 23) - 127;
    uint16_t mantissa = (temp & 0x007FFFFF) >> (8 - exponent);

    if (temp >> 31)
    {
        return ~((mantissa - 1U) | (1U << (15 + exponent)));
    }
    else
    {
        return mantissa | (1U << (15 + exponent));
    }
}

See also:

PeterBowman commented 1 year ago

A bunch of info on low-pass filters:

PeterBowman commented 1 year ago

A bunch of info on the CAN controller implementation for LPC176X targets and acceptance filters:

See also (regarding sleep/wait functions):

And the CAN-read-from-ISR-context issue:

PeterBowman commented 1 year ago

The cycle time has been estimated to be around 128.5 microseconds, i.e., how long it takes to generate all eight sensor channels. This is about 7.78 kHz, close to the 8 kHz stated in the manual.

photo_2023-09-12_12-03-24

PeterBowman commented 1 year ago

Useful Python tools for CAN inspection and debugging:

To instantiate a CAN interface:

sudo ip link set can0 up txqueuelen 1000 type can bitrate 1000000

To start an ASYNC publisher on ID 1 with period 10 ms (10000 us = 0x2710) and a cutout frequency of 2 Hz (200 Hz*0.01 = 0x00C8):

cansend can0 201#C80010270000
PeterBowman commented 11 months ago

Done:

Remarks:

The can-plotter.py app plots live sensor data after conversion to proper physical units. This video shows sensor bootup, initialization, calibration and data plotting (the script needs some polish, e.g. the axis units don't scale up):

https://github.com/roboticslab-uc3m/yarp-devices/assets/9977198/ff309f95-3de7-4c1a-9216-ae9e4119d3b1

There is a bunch of .ini launcher files ready for instantiating the new Jr3Mbed raw subdevice. I have tested but not added yet the motor+sensor .ini counterparts, which would entail duplicating almost everything we have now. Just for reference, for the right arm + JR3 sensor:

[devCan rightArmBus]
device "CanBusBroker"
description "CAN bus controller board for TEO's right arm"
buses ("socket-rightArm")
socket-rightArm ("id15-ipos" "id16-ipos" "id17-ipos" "id18-ipos" "id19-ipos" "id20-ipos" "id31-jr3")
syncPeriod 0.002

[mapper rightArmMapper]
device "controlboardremapper"
axesNames ("FrontalRightShoulder" "SagittalRightShoulder" "AxialRightShoulder" "FrontalRightElbow" "AxialRightWrist" "FrontalRightWrist")
calibrator "generic"

[wrapper rightArmPort]
device "controlBoard_nws_yarp"
name "/rightArm"
period 0.01

[mapper jr3Mapper]
device "multipleanalogsensorsremapper"
SixAxisForceTorqueSensorsNames ("rightHand")

[wrapper jr3Wrapper]
device "multipleanalogsensorsserver"
name "/rightHand"
period 10

The (now outdated) circuit board required several tweaks:

20240102_194459

I mostly adhered to the following diagram included in Javier Berrocal's master's thesis:

Captura de pantalla 2024-01-02 225115

@smcdiaz @100511161 please consider those tweaks and fixes in the new design. Besides, I also had to prepare a new data cable since the connector pins were inverted. As stated in the user manual, plugging the RJ cable the other way around may result in permanent damage to the sensor.

On a final note, please also check the power stages on the circuit board. I was unable to use the MBED when powered through the robot, it kept losing power after bootup (a blinking blue LED is the visual symptom). It did work as soon as I plugged the MBED to my computer through its USB-mini port.

PeterBowman commented 7 months ago

@AlbertoRodriguezSanz helped me to prepare a working demo on the ftCompensation app (https://github.com/roboticslab-uc3m/kinematics-dynamics/issues/191) featuring this new Mbed implementation (YT video link):

IMAGE ALT TEXT HERE

Setup:

I should have probably configured the correct TCP CoM and tool weight for the stump end-effector link, but it worked fine with the Lacquey one instead, anyway.

20240504_183150

This video depicts a similar experiment, while on CSV mode:

IMAGE ALT TEXT HERE