[FRIMWARE/SDK] IMU reports sometimes only zeros

jviereck commented 3 years ago

In ~1/4 of the cases lately the IMU returns only zeros. I am not sure what is going on but assume that something with setting up the IMU goes wrong and the master board is not receiving any packages from the IMU.

Has someone else observed this?

@thomasfla any idea how this could be fixed or did you observe this as well?

wxmerkt commented 3 years ago

We've observed issues and I wonder if they are related. I tracked our problems down to (1) the SPI frequency and (2) the temperature of the motor boards which are directly underneath the IMU (the IMU is temperature dependent as well).

For (1), which frequency are you running and for (2) after how long of operation time did you observe this?

jviereck commented 3 years ago

About (1): I am a bit confused: The IMU is not communicating via SPI. The SPI is for the motor driver boards. There, we needed to reduce the frequency from the default 6 Mhz to 4 Mhz. I didn't touch the frequency for communicating with the IMU

(2) I observe this sometimes right away: Starting the robot in the morning and the IMU will report zeros.

wxmerkt commented 3 years ago

About (1): I am a bit confused: The IMU is not communicating via SPI. The SPI is for the motor driver boards. There, we needed to reduce the frequency from the default 8 Mhz to 6 Mhz. I didn't touch the frequency for communicating with the IMU

Sorry I got that confused. We had to reduce the baud rate for the UART(?) communication due to data loss (when using a GX5-45 and a voltage converter). Separately, we had to reduce the SPI to 4 MHz since at 6 MHz we had frequent zero readings of the joint positions.

(2) I observe this sometimes right away: Starting the robot in the morning and the IMU will report zeros.

Okay, sorry for the noise then - our problems appear to be quite related to temperature. Since we have a different IMU our initialisation/communication sequence is different so the problems are likely unrelated (i.e. no baudrate setting via the firmware but fixed with the Microstrain Windows tool).

jviereck commented 3 years ago

The problem persists when starting the imu with the master board. Trying to interface with the IMU via USB using the provided USB <-> UART adapter from LORD, I am able to read the IMU just fine when using our linux communication library.

Therefore, I have the feeling something is wrong with the master board. Hopefully this is a software issue. I will dig into this deeper now.

jviereck commented 3 years ago

Unmounting the IMU from solo12 and mounting it again, the error is gone. The IMU always communicates data to the master board and shows up at the PC side. Therefore I put this into the Heisenberg-bugs section...

Could the problem be cause by having the IMU communication wire close to the power line that provides power to the front stack? Would it make sense to have the power supply cable for the front stack moving at the bottom of the robot?

thomasfla commented 3 years ago

Hi @jviereck, We never had communication errors with the IMU so far, but I guess you might be right and this is maybe caused by the wires being too close to some power wires. It could also be that some crimping on the IMU connector are not reliable.

Trying to have the wires as short as possible and preventing loops is a good practice. I propose to let the issue open until we can have more information about this bug, and if we need to take action in the firmware.

jviereck commented 3 years ago

Thanks for your reply @thomasfla .

The IMU communication wire is as short as it can be, so I doubt it is this. Also, the wire was created by @fgrimminger , which are usually high-quality crimped wires. So I would sort of rule out a bad crimp as well.

As per @fgrimminger suggestion, I will change our power wiring and have the power lines running on the bottom of the robot. I will do this and let you know what I observe.

Note that we had to reduce the SPI communication speed between the master board and the udrivers from 6 Mhz to 4 Mhz for stable communication. We did not have to do this in Tübingen, but people in Oxford also had to do this (correct me if I am wrong @wxmerkt ). Once the power wires are running on the bottom of the robot, I want to check if we can run the SPI at 6 Mhz a well without any SPI errors.

jviereck commented 3 years ago

Changing the power wires for the front stack to run on the bottom of the robot, I observe a stable communication at 6 Mhz. The IMU is working as well.

jviereck commented 3 years ago

Given that the wiring seems to have fixed the problem, should we close the issue?

What do you think @thomasfla ?

wxmerkt commented 3 years ago

@jviereck Electromagnetic inference can be quite a challenge! Do you have pictures of how it looks now, and would you recommend in general to alter the routing correspondingly? Did you seen an improvement/any change in the joint signals e.g. when standing still?

jviereck commented 3 years ago

@jviereck Electromagnetic inference can be quite a challenge! Do you have pictures of how it looks now, and would you recommend in general to alter the routing correspondingly?

Here is a picture how the bottom part of the robot looks like now (basically very straight forward from the bottom stack power supply to the front one). We use Wago connectors for connecting the individual stacks.

As said, before this chance we had to reduce the SPI speed from 6 Mhz to 4 Mhz. We never had to do this in Tübingen. Therefore, this is an indication that something was not right. As stated with the change, this is no longer needed. I need to run more experiments (I did the change yesterday and haven't used the robot since then). However, even with 4 Mhz I noticed sometimes, especially when running longer, that SPI communication errors would show up and I would need to power-down the robot. It happened not very often, but it happend exactly when a visitor was here and I was giving a demo^^

Given the above and given that the change is quite easy to implement, I would recommend doing it. We should also maybe update the documentation for Solo12 and advertise to have the power routed on the bottom.

@fgrimminger , what do you think about updating the documentation?

Did you seen an improvement/any change in the joint signals e.g. when standing still?

This sounds like you are having a problem here? I can collect some data but what are you hinting at?

jviereck commented 3 years ago

Here the promised picture:

PXL_20210617_125002700

wxmerkt commented 3 years ago

Thank you for the additional detail and picture @jviereck - we'll likely re-route the power cable as well then.

Did you seen an improvement/any change in the joint signals e.g. when standing still?

This sounds like you are having a problem here? I can collect some data but what are you hinting at?

When standing perfectly still (zero-order hold), we have been seeing (1) significant outliers in the velocity signal (that are different from the position signal going to zero when the SPI frequency is too high) and (2) non-zero velocity even when the joint isn't moving (the raw uint is 1 and never decrements below).

For (1), we are using filtering that includes outlier rejection and this works well. For (2), this issue impacts state estimation quite a bit and we have a work-around that if the uint is 1, we consider it to be 0. Those two changes made state estimation quite stable.

jviereck commented 3 years ago

Bad news: Doing more longer testing, I am getting SPI error messages with the current setup. I reduced the SPI communication from 6 down to 4 Mhz again and the system behaves well now.

jviereck commented 3 years ago

When standing perfectly still (zero-order hold), we have been seeing (1) significant outliers in the velocity signal (that are different from the position signal going to zero when the SPI frequency is too high) and (2) non-zero velocity even when the joint isn't moving (the raw uint is 1 and never decrements below).

For (1), we are using filtering that includes outlier rejection and this works well. For (2), this issue impacts state estimation quite a bit and we have a work-around that if the uint is 1, we consider it to be 0. Those two changes made state estimation quite stable.

This sounds something we should look into. @wxmerkt can you open a separate bug here in the repo for this such that we can trouble shoot it?

jviereck commented 3 years ago

This problem of zero IMU reading showed up again in our setup. Connecting the flashing device and running make monitor + adding the print_imu in the main run loop, I was able to get this output (maybe 1 out of 50 trials). Note that I modified the print_imu to print the intr_cpt as first number.

From the output below, all values including intr_cpt are zero. From this I conclude that uart_intr_handle is never called.

@thomasfla do you have any idea why this could be the case? I see some commented out vTaskDelay(10);` when sending the initial bytes to the IMU. Do you think we should add these statements and see if it happens again?

$ make -j8 monitor
Toolchain path: /home/jviereck/.espressif/tools/xtensa-esp32-elf/esp-2019r2-8.2.0/xtensa-esp32-elf/bin/xtensa-esp32-elf-gcc
WARNING: Toolchain version is not supported: esp-2019r2
Expected to see version: crosstool-ng-1.22.0-80-g6c4433a
Please check ESP-IDF setup instructions and update the toolchain, or proceed at your own risk.
WARNING: Compiler version is not supported: 8.2.0
Expected to see version(s): 5.2.0
Please check ESP-IDF setup instructions and update the toolchain, or proceed at your own risk.
Python requirements from /home/jviereck/esp/esp-idf/requirements.txt are satisfied.
MONITOR
--- idf_monitor on /dev/ttyUSB0 2000000 ---
--- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
�J��}�х��� Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
ETH/WIFI init size 4
ETH/WIFI command size 172
ETH/WIFI ack size 3
ETH/WIFI sensor size 200
SPI size 34
I (60) gpio: GPIO[4]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0 
I (60) gpio: GPIO[5]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0 
I (60) gpio: GPIO[16]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0 
I (70) system_api: Base MAC addess is not set, read default base MAC address from BLK0 of EFUSE
I (80) emac: emac reset done
I (80) Direct_Ethernet: Ethernet Started
I (80) wifi: wifi driver task: 3ffcaee4, prio:23, stack:3584, core=1
I (80) wifi: wifi firmware version: 7c00966
I (80) wifi: config NVS flash: enabled
I (80) wifi: config nano formating: disabled
I (80) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
I (80) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
I (90) wifi: Init dynamic tx buffer num: 32
I (90) wifi: Init data frame dynamic rx buffer num: 32
I (90) wifi: Init management frame dynamic rx buffer num: 32
I (90) wifi: Init management short buffer num: 32
I (100) wifi: Init static rx buffer size: 1600
I (100) wifi: Init static rx buffer num: 10
I (100) wifi: Init dynamic rx buffer num: 32
I (180) phy: phy_version: 4100, 2a5dd04, Jan 23 2019, 21:00:07, 0, 0
I (180) wifi: mode : sta (ac:67:b2:d7:99:d8)
I (180) wifi: set country: cc=JP schan=1 nchan=14 policy=1

I (180) ESPNOW: espnow [version: 1.0] init
initialise IMU
initialising uart for IMU
initialising CX5-25 IMU
sending new baud rate setting to the IMU 
Done
Setup done

0   0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    
0   0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    
0   0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000

jviereck commented 3 years ago

I see some commented out vTaskDelay(10);` when sending the initial bytes to the IMU. Do you think we should add these statements and see if it happens again?

I've activated these waits but the problem shows up still :/

thomasfla commented 3 years ago

Hi @jviereck, Ok, if the IMU never send data it is probably because it didn't received the initialisation commands. The initialisation commands are sent at start-up, and will configure the IMU to continuously stream all relevant data at highest data rate. To check that the IMU is NOT streaming, you can have a look at the LED witch should flash rapidly when streaming..

When configuring it, the IMU should respond to each commands with a validation message. We actually ignore them but we could implement the reading of those to better understand what is going on, and possibly also implement a retry procedure in case of failing.

I won't have time to implement this before the end of next week.

Hope this helps, Thomas.

jviereck commented 3 years ago

Hi @thomasfla ,

Ok, if the IMU never send data it is probably because it didn't received the initialization commands.

That makes sense. Your idea to implement a try procedure sounds good. I will try to see if I can get it implemented and keep you posted.

Best, Julian

jviereck commented 3 years ago

I've implemented a very simple retry algorithm. What I do is to setup the uart handler before sending the first commands to the imu. Then I send the first two commands (IDLE + Baudrate 921600) to the IMU. I check if I receive some packages from the imu. If that's not the case, I retry in a while loop. The code for this looks like this:

  // Setup the uart handler.
  uart_flush_input(UART_NUM);
  uart_set_baudrate(UART_NUM, 921600);
  uart_set_rx_timeout(UART_NUM, 5); //timeout in symbols
  // release the pre registered UART handler/subroutine
  uart_isr_free(UART_NUM);
  // register new UART subroutine
  uart_isr_register(UART_NUM, uart_intr_handle, NULL, ESP_INTR_FLAG_IRAM, &handle_console);

  do {
    printf("sending new baud rate setting to the IMU \n");
    vTaskDelay(30);
    uart_write_bytes(UART_NUM, cmd0, sizeof(cmd0));
    vTaskDelay(10);

    uart_write_bytes(UART_NUM, cmd6, sizeof(cmd6));
    parse_IMU_data_other();
  } while (intr_cpt == 0);

where parse_IMU_data_other displays the content of the rxbuf like this:

inline void parse_IMU_data_other()
{
  printf("intr_cpt=%d; rxbuf: header=%X, len=%d\n", intr_cpt, rxbuf[2], rxbuf[3]);
}

Does this work? Unfortunately not. After some power-off-on cycles, I found a state where the IMU is not responding. The IMU LED is pulsing at roughly 2 seconds intervals. I tried to reset the masterboard multiple times by calling make monitor, without success.

The output from the master board monitor looks always like this:

[...]
sending new baud rate setting to the IMU 
intr_cpt=0; rxbuf: header=0, len=0
sending new baud rate setting to the IMU 
intr_cpt=0; rxbuf: header=0, len=0
sending new baud rate setting to the IMU 
intr_cpt=0; rxbuf: header=0, len=0
sending new baud rate setting to the IMU 
intr_cpt=0; rxbuf: header=0, len=0

So definitely the retry option is not helping.

Given that the retry is not helping and no communication is observed + the IMU stays in pulsar LED mode, I wonder if the IMU we have is having some troubles.

As next step, I will flash the imu with the latest firmware and see if that helps. If not, I feel we should consider getting the LORD support involved.

@thomasfla , any thoughts?

thomasfla commented 3 years ago

Note that by default, the IMU baudrate is 115200. You first need to send the "cmd0" before switching the esp32 USART baudrate to 921600.

thomasfla commented 3 years ago

Maybe it would be easier using this function https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/peripherals/uart.html#_CPPv415uart_read_bytes11uart_port_tPv8uint32_t10TickType_t This way you do not use the Uart handler but simply read the incoming bytes. Just set a timeout of let say 100ms with ticks_to_wait

Also you can call a fluch before sending the command to be sure you read only the response. I hope this helps, I won't have access to the hardware before end of next week.

jviereck commented 3 years ago

Thanks for the comments @thomasfla - I take a look.

jviereck commented 3 years ago

I might have found a problem. Not sure if it's the solution yet. The IMU documentation states on page 61, chapter 4.2.15 UART Baud Rate (0x0C,0x40), that the baud rate gets applied only after a 250 ms delay. Right now our code looks like this:

  uart_write_bytes(UART_NUM, cmd6, sizeof(cmd6));  // Changes baud rate to 921600
  vTaskDelay(1);
  uart_flush_input(UART_NUM);
  uart_set_baudrate(UART_NUM, 921600);
  uart_set_rx_timeout(UART_NUM, 5); //timeout in symbols
  // release the pre registered UART handler/subroutine
  uart_isr_free(UART_NUM);
  // register new UART subroutine
  uart_isr_register(UART_NUM, uart_intr_handle, NULL, ESP_INTR_FLAG_IRAM, &handle_console);
  // enable RX interrupt
  vTaskDelay(10);
  printf("Done\n");
  uart_write_bytes(UART_NUM, cmd1, sizeof(cmd1));

From that it looks like we are waiting 11 ticks (which are 11 ms I guess) before sending the next command.

What I did for now is adding a vTaskDelay(250) after sending the cmd6 and will observe if this fixes the IMU issue.

thomasfla commented 3 years ago

Good catch.

Yes, the task delay is in ms with the current setting of freertos.

open-dynamic-robot-initiative / master-board

[FRIMWARE/SDK] IMU reports sometimes only zeros #89