rosflight / rosflight_ros_pkgs

ROS packages for the ROSflight autopilot
http://rosflight.org/
BSD 3-Clause "New" or "Revised" License
86 stars 56 forks source link

Freeze on parameter writing #144

Closed prothen closed 3 years ago

prothen commented 3 years ago

First of all thank you very much for creating and sharing this amazing repository, it has been and still is a great pleasure to explore all of its features!

I am currently encountering a problem with the writing of parameters and was hoping to figure out why and get a more convenient non-volatile parameter configuration.

My setup is based on the Xrotor nano F4 board and everything is working nicely so far, except the /param_write seems to freeze the FC.

My setup process is

  1. Linux4Tegra using master branch
    • Alternative workstation with Ubuntu 18.04 encounters same FC behaviour with freeze on /param_write (Simulation works seamlessly for both, also with /param_write)
  2. Connecting rosflight_io via /dev/ttyACM0
    • /calibrate_imu
    • MIXER 2
    • RC_TYPE 1
  3. Arming and motors spin /status shows no errors
  4. /param_write returns True but FC becomes unresponsive and does not publish to any topic (e.g. /status /imu/data) and requires power toggle (losing all defined parameter values in the process, so the writing does not seem to be successful)
  5. Hardware: Xrotor Nano F4 (OMNIBUSF4SD)

Am I missing something for using /param_write or is it maybe FC related (No SD card)?

ps: I tried alternatively to fetch parameters using /param_save_to_file and /param_load_from_file but it does not seem to keep all parameters (error code for imu not calibration present after loading the file. Which although it shows to load calibration parameters in the console log, still seems to require the IMU calibration to be redone).

superjax commented 3 years ago

Hey @prothen welcome!

I think that you're probably seeing a board-specific issue. I don't think we've ever tested on that board, and looking at the implementation, i can see where things might be going wrong.

I also don't have a board like that to test locally, but I'll bet we are hitting

https://github.com/rosflight/airbourne_f4/blob/1009867d7e6d9b5730b4d58d238a276782769f8c/src/M25P16.cpp#L95

because it doesn't actually have M25P16 installed.

superjax commented 3 years ago

I'm not quite sure how to proceed. We can attempt to use the F4 EEPROM instead of the SPI flash chip, or the SD card. The parameter struct is so small, I'll bet that it'll fit on the EEPROM, however I'm not sure that we ever got it working. @BillThePlatypus, do you know if the EEPROM stuff works? Looking at the git history, that looks like it was mostly your work.

superjax commented 3 years ago

Using the EEPROM is probably the fastest way to get things working, since it will be the same across boards.

prothen commented 3 years ago

Thanks for getting back so quickly!

I have now dived into the firmware and board implementation and identified some potential files to change. My current understandig of how things are tied together: (Collapsed section) - `ROSflight` Repository with ROS abstraction: Only `rosflight_io.{h,cpp}` depending on ROS1 middleware and forwarding everything to `MAVLink` serial interface -> `firmware` repository with `MAVLink` abstraction class -> implementing `board` specific microcontroller implementation `.boards/airbourne/main.cpp` (in our case) depending on STM32F4 chip (labeled `REVO`) -> `airbourne_f4` repository with abstraction of components for sensors and storage `M25P16` serial flash memory in [`m25p16.{h, cpp}`](https://github.com/rosflight/airbourne_f4/blob/dshot/src/M25P16.cpp) - All parameter depending components in the `firmware` repository implement `ParamsListenerInterface` and are linked to the `ComLinkInterface` implementation (currently only `MAVLink`), initialised in the `rosflight.cpp` constructor and the `state_manager` directs all parameter writes to the `board` chip specific implementation (`Board` subclass `AirbourneBoard`) method [`memory_write`](https://github.com/rosflight/firmware/blob/master/boards/airbourne/airbourne_board.cpp) which uses the attribute `flash_` (currently the M25P16), so potentially just the replacement of this object with our own EEPROM implementation but the same interfacing methods

I didn't have time to look into a few things yet and am a bit unsure about the following:

  1. What is the backup_sram used for?
  2. Should we replace M25P16 and make EEPROM standard or is it potentially limiting future developments if space runs out, or what was the main motivation to choose external storage in the first place?
  3. If I understood correctly, the airbourne_f4 is an abstraction for all STM32F4 chips, but how and where are FC implementation specifics differentiated, such as the missing M25P16 in the Xrotor nano F4 case.
  4. It looks like this is not a rosflight issue, shall I reopen an issue for firmware and we close this with a reference? I then could create a branch for us with a WIP PR on the firmware repository to discuss further and on the weekend I could attempt on chip debugging
  5. Where do I start to verify a new FC implementation in terms of component conformance (e.g. accelerometer, gyroscope, magnetometer component deviation or change in orientation -> I am experiencing rosflight/firmware#360) Maybe we can also document this journey for the developer docs to create some template of a guideline for porting existing implementations? I also have yet to find some STM32F4 pinout to the Xrotor nano F4 exposed pads, are those usually open sourced somewhere?
BillThePlatypus commented 3 years ago

I can answer some of your questions.

  1. backup_sram is used in case of a firmware crash. It is a small portion of the memory that is not cleared on reboot. A small amount of information is stored there, so that in the case of a firmware crash, the vehicle can hopefully reboot and continue flying.
  2. One could define a separate board file, based off AirbourneBoard that replaces functions such as memory_write to work with EEPROM instead of flash.
  3. The OpenPilot Revolution, the currently supported board is open source, and so connection documentation is available. I couldn't tell you about the Xrotor nano, but if it's not labeled as open source hardware, I'd expect that there wouldn't be documentation.
rosflight-admin commented 3 years ago

This issue has been mentioned on ROSflight. There might be relevant details there:

https://discuss.rosflight.org/t/firmware-port-to-xrotor-nano-f4/60/1

prothen commented 3 years ago

Thanks for all your feedback so far!

I am currently stuck since I can't debug my firmware changes on-chip. (The nano board does not explicitly expose the SWD pins and the support prefers to not share any hardware information related to it.)

Do you have suggestions on how I could proceed with this board? I was hoping to contribute a EEPROM based board layout for it, but since it constitues more changes where things could go wrong, I don't want to attempt it blindly without on-chip debugging. Maybe you see an intermediate solution that I can attempt with only small adaptions to the M25P16 to get things working in the current setup?

I am now also looking into a replacement FCU. Is there one that you would prefer for more longer term support? I would also be interested to work on the porting to a F7 creating a more standardised setup process to alleviate entry barriers.

prothen commented 3 years ago

@superjax and @BillThePlatypus Do you know what could be the reason for num_errors in the /status message to be continuously increasing although the error_code is zero and all sensor readings under /attitude look good. (The num_errors also seems to overflow at some point and then becomes negative. It then continues to decrease instead until positive.)

Edit: I tracked it down to be part of the board implementation using ext_i2c here . It looks like it is only for sonar and airspeed sensors, so I can ignore it in my case right?

superjax commented 3 years ago

Yeah, you can ignore it. It should stop counting if you arm.

The errors indicate a failed i2c command. This happens when we are polling for sensors that aren't there, but it could also happen if a sensor were to stop responding. We don't look for sensors if the FCU is armed.

prothen commented 3 years ago
  1. Master vs v1.3.0 causes RC_LOST:

When I build from source under the untouched v1.3.0 tag everything still works as expected and all error messages disappear. If I checkout to master, the RC_LOST error remains. I couldn't find features related to the RC that could have caused this. Do you have an idea where this error could come from?

  1. Regarding the persistent parameter updates (using master branch):

Edit: I have tried to replace the sram with but it does not seem to work. It does not freeze anymore but instead just stops for .5s while /param_write and then continues streaming data. After a power toggle the written parameters are not recovered but it is uninitialised.

// non-volatile memory
void AirbourneBoard::memory_init()
{
  return eeprom_init();
  //return flash_.init(&spi3_);
}

bool AirbourneBoard::memory_read(void *data, size_t len)
{
  return eeprom_read(data, len);
  //return flash_.read_config(reinterpret_cast<uint8_t *>(data), len);
}

bool AirbourneBoard::memory_write(const void *data, size_t len)
{
  return eeprom_write(data, len);
  //return flash_.write_config(reinterpret_cast<const uint8_t *>(data), len);
}

Edit2: I just noticed that due to everything running in detached tmux shells I missed to share the output for most events.

The parameter writing is reported to be successful.

[ INFO] [1622947674.276260624]: Param write succeeded
[ INFO] [1622947674.276386580]: Onboard parameters have been saved

Below the output for params.h/params.cpp, which are initialised based on the successful read using the board.memory_read() and then printing using a mavlink log message statustext which is also available on ROS. So following the output of rosflightIO, this is an active ROS node right after the power toggle reconnection.

Collapsed section of print out for power toggle: ```bash [ INFO] [1622946292.516276932]: Connecting to serial port "/dev/ttyACM0", at 921600 baud [ WARN] [1622946292.581826455]: RC override active [ERROR] [1622946292.581964076]: Autopilot ERROR: Invalid mixer [ERROR] [1622946292.582029863]: Autopilot ERROR: RC lost [ERROR] [1622946292.582077475]: Autopilot ERROR: Uncalibrated IMU [ERROR] [1622946292.582123232]: Autopilot ERROR: Buffer Overrun [ WARN] [1622946292.582175868]: Autopilot now in ANGLE mode [ WARN] [1622946292.588162945]: ROSflight version does not match firmware version. Errors or missing features may result [ WARN] [1622946292.588252026]: ROSflight version: 1.3 [ WARN] [1622946292.588306134]: Firmware version: latest-0-gabf537d3 [ INFO] [1622946292.638452654]: Detected time offset of 1622946290.340 s. [ INFO] [1622946293.344282517]: Got HEARTBEAT, connected. [ WARN] [1622946293.568531381]: [Autopilot]: Unable to load parameters; using default values [ERROR] [1622946293.586874824]: [Autopilot]: Invalid Mixer Choice ``` The parameter size `params_t` is ```bash [ WARN] [1622951496.397681664]: [Autopilot]: Unable to load parameters; using default values [ WARN] [1622951496.417963567]: [Autopilot]: Total parameter size is 2384 bytes. ``` It fails the parameter read at `if (params.size != sizeof(params_t) || params.magic_be != 0xBE || params.magic_ef != 0xEF){` ```bash [ WARN] [1622952354.355928073]: [Autopilot]: size params 2384 [ WARN] [1622952354.376907583]: [Autopilot]: params.magic_be 190 [ WARN] [1622952354.395815937]: [Autopilot]: params.magic_efe 63 [ WARN] [1622952354.415805982]: [Autopilot]: Failed size mismatch or magic be or ef incorrect [ WARN] [1622952354.437745580]: [Autopilot]: Unable to load parameters; using default values [ WARN] [1622952354.455786008]: [Autopilot]: Total parameter size is 2384 bytes. ``` _Note: The decimal magic numbers `be and ef` are in hex _ ```python >>> hex(190) '0xbe' >>> hex(63) '0x3f' >>> ``` Don't really know what the magic_ef is used for though... @BillThePlatypus do you have an idea?

Could it be that the len parameter is just uint8_t, seems a bit small for the printed size of 2384?

For reference :

prothen commented 3 years ago

Changing the length to uint16_t fixed it, hope the documentation of the off-chip debugging will be useful to someone in the future. :D

Thanks for the eeprom suggestion and all the guidance!