mavlink / mavros

MAVLink to ROS gateway with proxy for Ground Control Station
Other
879 stars 990 forks source link

Frequent Hard syncing of clock #832

Closed kevin-george closed 6 years ago

kevin-george commented 6 years ago

Issue details

When running the px4.launch with default parameters, we see repeated and frequent hard syncing of clocks(The logs below show this). Is this expected behavior and/or something we should be concerned about? Can we do something to decrease this hard-syncing?

MAVROS version and platform

Mavros: 0.21.0 ROS: Kinetic Ubuntu: 16.04

Autopilot type and version

[ ] ArduPilot [X] PX4

Version: 3.1

Node logs

Oct 08 15:42:45 earth bash[4537]: [ INFO] [1507491750.904468051]: IMU: Attitude quaternion IMU detected!
Oct 08 15:42:45 earth bash[4537]: [ INFO] [1507491750.927802939]: RC_CHANNELS message detected!
Oct 08 15:42:45 earth bash[4537]: [ WARN] [1507491751.020480951]: TM: Clock skew detected (-1507482836.185645103 s). Hard syncing clocks.
Oct 08 15:42:45 earth bash[4537]: [ INFO] [1507491751.808186449]: CON: Got HEARTBEAT, connected. FCU: PX4 Autopilot
Oct 08 15:42:45 earth bash[4537]: [ INFO] [1507491751.810529817]: IMU: High resolution IMU detected!
Oct 08 15:42:45 earth bash[4537]: [ INFO] [1507491751.828435748]: IMU: Attitude quaternion IMU detected!
Oct 08 15:42:45 earth bash[4537]: [ INFO] [1507491751.828756489]: RC_CHANNELS message detected!
Oct 08 15:42:45 earth bash[4537]: [ INFO] [1507491752.826174497]: VER: 3.1: Capabilities         0x00000000000024ef
Oct 08 15:42:45 earth bash[4537]: [ INFO] [1507491752.826472076]: VER: 3.1: Flight software:     01060500 (000000009168553C)
Oct 08 15:42:45 earth bash[4537]: [ INFO] [1507491752.826743503]: VER: 3.1: Middleware software: 01060500 (000000009168553C)
Oct 08 15:42:45 earth bash[4537]: [ INFO] [1507491752.827013252]: VER: 3.1: OS software:         000000c0 (0000000000000000)
Oct 08 15:42:45 earth bash[4537]: [ INFO] [1507491752.827140082]: VER: 3.1: Board hardware:      00000011
Oct 08 15:42:45 earth bash[4537]: [ INFO] [1507491752.827226241]: VER: 3.1: VID/PID:             26ac:0011
Oct 08 15:42:45 earth bash[4537]: [ INFO] [1507491752.827292014]: VER: 3.1: UID:                 3236510836363331
Oct 08 15:42:45 earth bash[4537]: [ WARN] [1507491752.827704016]: CMD: Unexpected command 520, result 0
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507491765.728494141]: WP: mission cleared
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507491766.816380443]: WP: mission received
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507491768.587531852]: TM: Clock skew detected (0.011146073 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507491778.589108280]: TM: Clock skew detected (0.010699734 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507491795.889460676]: TM: Clock skew detected (0.011273170 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507491813.491556651]: TM: Clock skew detected (0.011918806 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507491875.290061955]: TM: Clock skew detected (0.010160634 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507491888.109670877]: TM: Clock skew detected (-0.010037809 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507491899.293630219]: TM: Clock skew detected (0.010369652 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507491943.217470561]: TM: Clock skew detected (-0.012087764 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507492040.778745335]: CON: Lost connection, HEARTBEAT timed out.
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492053.268813868]: RC_CHANNELS message detected!
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492053.269194628]: IMU: High resolution IMU detected!
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507492053.309645259]: TM: Clock skew detected (-9212.821457905 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492053.599224992]: CON: Got HEARTBEAT, connected. FCU: PX4 Autopilot
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492053.602580419]: IMU: High resolution IMU detected!
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492053.602896550]: RC_CHANNELS message detected!
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492054.668189941]: VER: 3.1: Capabilities         0x00000000000024ef
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492054.668326066]: VER: 3.1: Flight software:     01060500 (000000009168553C)
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492054.668395093]: VER: 3.1: Middleware software: 01060500 (000000009168553C)
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492054.668482853]: VER: 3.1: OS software:         000000c0 (0000000000000000)
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492054.668567267]: VER: 3.1: Board hardware:      00000011
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492054.668647447]: VER: 3.1: VID/PID:             26ac:0011
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492054.668824645]: VER: 3.1: UID:                 3236510836363331
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507492054.669265922]: CMD: Unexpected command 520, result 0
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492058.541912234]: IMU: Attitude quaternion IMU detected!
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492063.599797515]: HP: requesting home position
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507492067.863162042]: PR: request param #389 timeout, retries left 2, and 1 params still missing
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507492068.318311613]: TM: Clock skew detected (-0.011880676 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492068.631650408]: WP: mission received
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492073.599640389]: HP: requesting home position
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507492079.891025529]: TM: Clock skew detected (0.010595255 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492083.600029056]: HP: requesting home position
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507492090.312575547]: TM: Clock skew detected (-0.010493361 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492093.599964063]: HP: requesting home position
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507492100.493317766]: TM: Clock skew detected (0.010849217 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492103.600477117]: HP: requesting home position
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507492111.910311816]: TM: Clock skew detected (-0.010247530 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ INFO] [1507492113.599917119]: HP: requesting home position
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507492126.111370051]: TM: Clock skew detected (-0.010106776 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507492149.189408240]: TM: Clock skew detected (0.010374995 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507492175.591969230]: TM: Clock skew detected (0.011031831 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507492186.392637699]: TM: Clock skew detected (0.010132172 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507492198.018048246]: TM: Clock skew detected (-0.010459247 s). Hard syncing clocks.
Oct 08 15:50:19 earth bash[4537]: [ WARN] [1507492208.193118939]: TM: Clock skew detected (0.010767844 s). Hard syncing clocks.

Diagnostics

header: 
  seq: 892
  stamp: 
    secs: 1507492903
    nsecs: 897026556
  frame_id: ''
status: 
  - 
    level: 0
    name: mavros: FCU connection
    message: connected
    hardware_id: /dev/pixhawk:921600
    values: 
      - 
        key: Received packets:
        value: 36173
      - 
        key: Dropped packets:
        value: 0
      - 
        key: Buffer overruns:
        value: 0
      - 
        key: Parse errors:
        value: 0
      - 
        key: Rx sequence number:
        value: 112
      - 
        key: Tx sequence number:
        value: 248
      - 
        key: Rx total bytes:
        value: 26934610
      - 
        key: Tx total bytes:
        value: 583166
      - 
        key: Rx speed:
        value: 36208.000000
      - 
        key: Tx speed:
        value: 794.000000
  - 
    level: 0
    name: mavros: GPS
    message: 3D fix
    hardware_id: /dev/pixhawk:921600
    values: 
      - 
        key: Satellites visible
        value: 13
      - 
        key: Fix type
        value: 3
      - 
        key: EPH (m)
        value: 0.77
      - 
        key: EPV (m)
        value: 1.29
  - 
    level: 0
    name: mavros: Heartbeat
    message: Normal
    hardware_id: /dev/pixhawk:921600
    values: 
      - 
        key: Heartbeats since startup
        value: 1127
      - 
        key: Frequency (Hz)
        value: 0.999985
      - 
        key: Vehicle type
        value: Octorotor
      - 
        key: Autopilot type
        value: PX4 Autopilot
      - 
        key: Mode
        value: STABILIZED
      - 
        key: System status
        value: Standby
  - 
    level: 0
    name: mavros: System
    message: Normal
    hardware_id: /dev/pixhawk:921600
    values: 
      - 
        key: Sensor present
        value: 0x00000000
      - 
        key: Sensor enabled
        value: 0x00000000
      - 
        key: Sensor helth
        value: 0x00000000
      - 
        key: CPU Load (%)
        value: 59.7
      - 
        key: Drop rate (%)
        value: 0.0
      - 
        key: Errors comm
        value: 0
      - 
        key: Errors count #1
        value: 0
      - 
        key: Errors count #2
        value: 0
      - 
        key: Errors count #3
        value: 0
      - 
        key: Errors count #4
        value: 0
  - 
    level: 0
    name: mavros: Battery
    message: Normal
    hardware_id: /dev/pixhawk:921600
    values: 
      - 
        key: Voltage
        value: 65.54
      - 
        key: Current
        value: -0.0
      - 
        key: Remaining
        value: -1.0
  - 
    level: 0
    name: mavros: Time Sync
    message: Normal
    hardware_id: /dev/pixhawk:921600
    values: 
      - 
        key: Timesyncs since startup
        value: 1384
      - 
        key: Frequency (Hz)
        value: 9.999843
      - 
        key: Last dt (ms)
        value: 0.425825
      - 
        key: Mean dt (ms)
        value: 0.010965
      - 
        key: Last system time (s)
        value: 855.001054000
      - 
        key: Time offset (s)
        value: 1507492048.889048100
--

Check ID

OK. I got messages from 3:1.

---
Received 8376 messages, from 1 addresses
sys:comp   list of messages
  3:1     0, 1, 2, 140, 141, 147, 24, 30, 31, 32, 33, 36, 65, 70, 74, 76, 83, 230, 231, 105, 111, 241, 242, 245
vooon commented 6 years ago

Do you use ntpd_driver? Do you reset autopilot during this log (suspicious CON msgs)? Do you use 16S battery?

kevin-george commented 6 years ago

Thank you for your response.

Do you use ntpd_driver?

We use ntpdate for syncing with a time server.

Do you reset autopilot during this log (suspicious CON msgs)?

We did not the reset autopilot

Do you use 16S battery?

We get this when the pixhawk is powered using a usb. Does powering with a 16S battery help?

vooon commented 6 years ago
  1. So you have internet connection onboard?
  2. Then problem may be in wiring or hardware.
  3. No, i asked because 65 volt battery is unusual. With USB that should be near zero. Or PX4 send undocumented value, 65535 (-1).
kevin-george commented 6 years ago

So you have internet connection onboard?

No we don't. We just run the sync manually before flight.

Then problem may be in wiring or hardware.

We have a simple FTDI cable that connects the onboard computer USB to PX4. This happens on multiple machines, so wiring seems unlikely to be the cause unless we got 3 wires from 2 different companies messed up together? Is there any way to check this?

No, i asked because 65 volt battery is unusual. With USB that should be near zero. Or PX4 send undocumented value, 65535 (-1).

Our batteries are not 16S. Where on pixhawk or mavros is the setting that sets battery voltage to 65V? Perhaps we can change it and try. I am guessing this is not the issue though?

vooon commented 6 years ago

Try to use https://github.com/vooon/ntpd_driver to feed GPS time to ntpd. Hard sync only occurs if dT > 10 ms (look at diagnostics "Time Sync"). Also try to measure latency https://gist.github.com/vooon/e50fd7d38c137aa932b8caf18326d12e (require two uart modules connected).

Also try to reconfigure power module params.

TSC21 commented 6 years ago

@kevin-george is this solved to you?

kevin-george commented 6 years ago

@TSC21 I apologize for the delay in response, got pulled away to fix other issues. I wish to actively solve this so I don't waste more of your time :)

@vooon

Try to use https://github.com/vooon/ntpd_driver to feed GPS time to ntpd.

Wouldn't this require the GPS module to be connected to the offboard computer(in our case an intel NUC) instead of directly to pixhawk?

Hard sync only occurs if dT > 10 ms (look at diagnostics "Time Sync").

I saw this in system_time.cpp, the diagnostics key "Timesyncs since startup" also gets reset whenever the hard sync occurred. The CON messages that were showing up before, don't show up anymore.

Also try to reconfigure power module params.

Can the misbehavior of the clock be because of incorrect power configuration?

vooon commented 6 years ago

No, GPS module should be connected to FCU. Then it sends SYSTEM_TIME message, that handled by sys_time and converted to ROS time message, then ntpd_driver send that time to ntpd via shared memory. Of course that chain has significantly worse precision than regular NTP or gpsd, but do not require additional gps.

kevin-george commented 6 years ago

Thank you for responding @vooon I tried using the ntpd configuration for the ntpd_driver. I still see Hard syncing of clocks every few seconds. Could the clock on the pixhawk be malfunctioning? I don't understand why it would go out of sync so fast?

Also, when I run ttyping.cc to measure latency, I get a segfault

./ttyping -b 921600 /dev/ttyUSB0  
--baud: "921600"  
--help: false  
--interval: "1000"  
--pong: false  
<tty>: "/dev/ttyUSB0"  
Opening: /dev/ttyUSB0 Mode: PING  
io_service running  
Segmentation fault (core dumped)  

Here's the backtrace

(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x0000000000400e97 in __gthread_equal (__t1=140737354110720, __t2=0)
    at /usr/include/x86_64-linux-gnu/c++/5/bits/gthr-default.h:680
#2  0x0000000000405c97 in std::operator== (__x=..., __y=...) at /usr/include/c++/5/thread:84
#3  0x0000000000405cfc in std::thread::joinable (this=0x7fffffffe310)
    at /usr/include/c++/5/thread:170
#4  0x00000000004021cd in main (argc=4, argv=0x7fffffffe568) at ttyping.cc:243

The pixhawk is connected to ttyUSB0

 stty < /dev/ttyUSB0
speed 921600 baud; line = 0;
min = 1; time = 0;
-brkint -icrnl -imaxbel
-opost
-isig -icanon -iexten -echo
vooon commented 6 years ago

No idea, but better keep ntpd.

ttyping quick and dirty, and require one instance on both ends (so it needs two UART's). This tool mostly needed to check latency of external UART like USB ones. One instance should be started on responder (better to use internal SoC tty), other on target tty. Result is total latency, which is slightly lower on SoC side.

Actually that "ping" may be done on MAVLink, but for that need to implement responder on fcu side. And anyway TIMESTAMP are working similar.

kevin-george commented 6 years ago

I think I will be keeping ntpd_driver running.

Right now there are 3 clock times :) - offboard computer, gps time & pixhawk system clock With your ntpd_driver I can get the first 2 to sync, is there any way to get the 3rd synced to the first 2 as well? Maybe by changing time_ref_source in px4_config.yaml from fcu to mavros?

vooon commented 6 years ago

sys_time should sync OBC and FCU time. Actually it is not synchronise clocks, but adjust message timestamps on both sides. time_ref_source only change TimeReference.source filed.