septentrio-gnss / septentrio_gnss_driver

ROS 1 & 2 driver for Septentrio GNSS & INS receivers
BSD 3-Clause "New" or "Revised" License
71 stars 37 forks source link

Some msgs are dropped #103

Open javijurado opened 9 months ago

javijurado commented 9 months ago

Hello, I bought asterxsbi3 pro and to read the data I use ros2 node. I saw that the topic twits_ins has a rate 100 Hz but checking some plot I saw some frames are dropped As you can see in the picture attached, some msgs have dissapeared.

missing_data twist_ins/linear/x

Any advise?

thomasemter commented 9 months ago

Hi there,

I am not sure what that plot is showing exactly, since on the x-axis there seems to be a 1e10 scaling? Can you please explain what is shown and if this is live data or from a bag file? Which tool did you use to visualize or record the data? Some ROS 2 tools are known to have problems with high rate data, e.g. here. Also, unreliable connections may have side effects.

Also, which interface of the AsteRx is used? Serial baud rates have to be set appropriately for high rate data.

javijurado commented 9 months ago

Sorry for my bad explanation. Few days ago, we were using the AsteRx SBi3 Pro to get the position and ins velocity on the platform. After analyze the data, we saw some messages are dropped. You can see on the picture attached below.

image

We though that the problem was the software that we were using to plot the data. We have tried plotjuggler, Floxglove and our own script to plot data (On real time and using rosbags ). Same result in each plotter.

We are using ethernet interface to get the data.

thomasemter commented 9 months ago

Thank you for the clarification. I assume you are running the driver with ROS timestamps, i.e. use_gnss_time set to false, correct?

Counting the data points in your foxglove screenshot, there are 20 withing 0.2 s, which is 100 Hz data rate. So it seems you are not loosing data but the rate ist fluctuating. This could be caused by a highly loaded network. Are there other devices in your network like GigE cameras or LiDARs producing a lot of UDP traffic? Running ros2 topic hz /twist_ins gives you some statistics on the delta times. Can you try to run this with just the AsteRx connected and no other active ROS nodes?

javijurado commented 9 months ago

Exactly, we are running using ROS timestamps.

Are there other devices in your network like GigE cameras or LiDARs producing a lot of UDP traffic? There are not other devices connected to my network, just Septentrio and my computer to read the data.

thomasemter commented 9 months ago

I am sorry but in that case I am a little clueless right now. Some jitter of a few % is normal but I haven't experienced it to that extent. Are you using TCP and could you try with UDP for comparison (parameter stream_device.udp)?

javijurado commented 9 months ago

My actual configuration is this one:


# Configuration Settings for the Rover Rx

# GNSS/INS Parameters

device: tcp://my_ip:28784 

serial:
  baudrate: 921600
  hw_flow_control: "off"

# stream_device:
#   tcp:
#     ip_server: ""
#     port: 0
#   udp:
#     ip_server: ""
#     port: 0
#     unicast_ip: ""

configure_rx: true

# login:
#   user: ""
#   password: ""

# osnma:
#   mode: "off"
#   ntp_server: ""
#   keep_open: true

frame_id: gnss 

imu_frame_id: imu

poi_frame_id: base_link

vsm_frame_id: vsm

aux1_frame_id: aux1

vehicle_frame_id: base_link

local_frame_id: odom

insert_local_frame: false

get_spatial_config_from_tf: false 

lock_utm_zone: true

use_ros_axis_orientation: true

receiver_type: ins

multi_antenna: true

datum: Default

# poi_to_arp: # offsets of the main GNSS antenna reference point (ARP) with respect to the point of interest (POI = marker). Use for static receivers only.
#   delta_e: 0.0
#   delta_n: 0.0
#   delta_u: 0.0

att_offset: ## for board setup
  heading: 90.0
  pitch: 0.0

ant_type: "SEPPOLANT_MC.V2"
ant_serial_nr: "Unknown"
ant_aux1_type: "SEPPOLANT_MC.V2"
ant_aux1_serial_nr: "Unknown"

polling_period: ## tbd what it can do/ we need
  pvt: 0
  rest: 500

use_gnss_time: false

latency_compensation: false

#Data hidden
#rtk_settings:
  # ip_server_1:
  #   id: ""
  #   port: 0
  #   rtk_standard: "auto"
  #   send_gga: "auto"
  #   keep_open: true
  # serial_1:
  #   port: ""
  #   baud_rate: 115200
  #   rtk_standard: "auto"
  #   send_gga: "auto"
  #   keep_open: true

publish:
  # For both GNSS and INS Rxs
  navsatfix: true
  gpsfix: true
  gpgga: true
  gprmc: true
  gpst: true
  measepoch: false
  pvtcartesian: true
  pvtgeodetic: true
  basevectorcart: false
  basevectorgeod: false
  poscovcartesian: true
  poscovgeodetic: true
  velcovcartesian: false
  velcovgeodetic: true
  atteuler: true
  attcoveuler: true
  pose: true
  twist: true
  diagnostics: true
  aimplusstatus: true
  galauthstatus: false
  # For GNSS Rx only
  gpgsa: false
  gpgsv: false
  # For INS Rx only
  insnavcart: true
  insnavgeod: true
  extsensormeas: false
  imusetup: false
  velsensorsetup: false
  exteventinsnavcart: false
  exteventinsnavgeod: false
  imu: true
  localization: true
  tf: false
  localization_ecef: false
  tf_ecef: false

# INS-Specific Parameters

ins_spatial_config:
  imu_orientation:
    theta_x: 180
    theta_y: 0.0
    theta_z: 0.0
  poi_lever_arm:
    delta_x: 0.0
    delta_y: 0.0
    delta_z: 0.0
  ant_lever_arm: 
    x: 0.0
    y: -0.305
    z: -0.0102
  vsm_lever_arm:
    vsm_x: 0.0
    vsm_y: 0.0
    vsm_z: 0.0

ins_initial_heading: auto

ins_std_dev_mask:
  att_std_dev: 5.0
  pos_std_dev: 10.0

ins_use_poi: false

ins_vsm:
  ros:
    source: "twist"
    config: [true, true, false]
    variances_by_parameter: false
    # variances: [0.0, 0.0, 0.0]
  # ip_server:
  #   id: ""
  #   port: 0
  #   keep_open: true
  # serial:
  #   port: ""
  #   baud_rate: 115200
  #   keep_open: true

# logger

activate_debug_log: false

How can I change stream_device?

thomasemter commented 9 months ago

You have a lot of publishers active, which means you pose a high demand on the AsteRx and transmission bandwidth. Maybe you can deactivate some publishers if you do not need them. E.g., gpsfix needs raw measurements which take up a lot of bandwidth.

You can setup the stream device for UDP like this:

stream_device:
  udp:
    ip_server: "IPS1"
    port: 28785
    unicast_ip: "[ip_of_pc]"

You should be aware that UDP is meant for low latency transmission of a few topics. From the manual: Note that the UDP implementation is meant to be used with small data volumes and low update rates. It is the user’s responsibility to only enable short messages at low rate when using UDP, in order to prevent throughput degradation of the network. Nevertheless, I have used it with high update rates without any problem but it is advisable to watch the CPU load.

abaeyens-imod commented 8 months ago

thank you @thomasemter for looking into this.

Unfortunately, we're still experiencing issues. Sometimes it works very well, with ROS messages on topic twist_ins being timestamped and arriving at a steady 100 Hz (<5 ms in TCP and < 2 ms UDP), while at other times we see an interval on the order of 100 ms between subsequent messages and UDP mode not even working.

Today we tested with several devices. All the tried configurations reproduced the same undesired result of having a gap of at least 20 ms somewhere in every subsequent 100 messages.

We tried two routers, two network switches, two different laptops with a different OS version (Ubuntu 20.04 and 22.04) and ROS distributions (Foxy and Humble), four ways of connecting to the router (native gigabit port, two different USB-to-ethernet adapters and wifi). Internet access was disabled on the router side, the CPU load on the computer was negligible (< 10 %) and only two devices were connected to the router's switch (the Septentrio and our computer running ROS).

Capturing the TCP traffic with Wireshark showed that now and then there's a long delay - e.g. 80 ms - between subsequent TCP packets.

Enabling UDP mode caused UDP packages to show up in Wireshark, however the Septentrio node didn't publish any messages on the twist_ins topic. It did max out one CPU core though. This did work in the past.

I assume we're making some error in a configuration somewhere, however it is unclear to me where. We would greatly appreciate any suggestions on how to resolve this. In case there is a need for more information from our side, please let know.


Further information: The receiver firmware is 1.4.1 and for the ROS node we run the latest version from GitHub compiled from source. We launched the Septentrio node using ros2 launch septentrio_gnss_driver rover.launch.py. This is the modified rover.yaml config file we used for TCP:

# Configuration Settings for the Rover Rx

# GNSS/INS Parameters

device: tcp://192.168.140.80:28784

configure_rx: true

login:
  user: ""
  password: ""  

osnma:
  mode: "off"
  ntp_server: ""
  keep_open: true

frame_id: gnss

imu_frame_id: imu

poi_frame_id: base_link

vsm_frame_id: vsm

aux1_frame_id: aux1

vehicle_frame_id: base_link

local_frame_id: odom

insert_local_frame: false

get_spatial_config_from_tf: false

lock_utm_zone: true

use_ros_axis_orientation: true

receiver_type: ins

multi_antenna: true

datum: Default

poi_to_arp:
  delta_e: 0.0
  delta_n: 0.0
  delta_u: 0.0

att_offset:
  heading: 0.0
  pitch: 0.0

ant_type: "SEPPOLANT_MC.V2"
ant_serial_nr: "Unknown"
ant_aux1_type: "SEPPOLANT_MC.V2"
ant_aux1_serial_nr: "Unknown"

polling_period:
  pvt: 0
  rest: 500

use_gnss_time: false

latency_compensation: false

rtk_settings:
  ntrip_1:
    id: ""
    caster: ""
    caster_port: 2101
    username: ""
    password: ""
    mountpoint: ""
    version: "v2"
    tls: false
    fingerprint: ""
    rtk_standard: "auto"
    send_gga: "auto"
    keep_open: true
  ip_server_1:
    id: ""
    port: 0
    rtk_standard: "auto"
    send_gga: "auto"
    keep_open: true
  serial_1:
    port: ""
    baud_rate: 115200
    rtk_standard: "auto"
    send_gga: "auto"
    keep_open: true

publish:
  # For both GNSS and INS Rxs
  navsatfix: false
  gpsfix: true
  gpgga: false
  gprmc: false
  gpst: false
  measepoch: false
  pvtcartesian: false
  pvtgeodetic: true
  basevectorcart: false
  basevectorgeod: false
  poscovcartesian: false
  poscovgeodetic: true
  velcovcartesian: false
  velcovgeodetic: true
  atteuler: true
  attcoveuler: true
  pose: false
  twist: true
  diagnostics: false
  aimplusstatus: true
  galauthstatus: false
  # For GNSS Rx only
  gpgsa: false
  gpgsv: false
  # For INS Rx only
  insnavcart: false
  insnavgeod: false
  extsensormeas: false
  imusetup: false
  velsensorsetup: false
  exteventinsnavcart: false
  exteventinsnavgeod: false
  imu: false
  localization: false
  tf: false
  localization_ecef: false
  tf_ecef: false

# INS-Specific Parameters

ins_spatial_config:
  imu_orientation:
    theta_x: 0.0
    theta_y: 0.0
    theta_z: 0.0
  poi_lever_arm:
    delta_x: 1.0
    delta_y: 0.0
    delta_z: 0.0
  ant_lever_arm:
    x: 1.0
    y: 0.0
    z: 0.0
  vsm_lever_arm:
    vsm_x: 1.0
    vsm_y: 0.0
    vsm_z: 0.0

ins_initial_heading: auto

ins_std_dev_mask:
  att_std_dev: 5.0
  pos_std_dev: 10.0

ins_use_poi: false

ins_vsm:
  ros:
    source: ""
    config: [false, false, false]
    variances_by_parameter: false
    variances: [0.0, 0.0, 0.0]
  ip_server:
    id: ""
    port: 0
    keep_open: true
  serial:
    port: ""
    baud_rate: 115200
    keep_open: true

# logger

activate_debug_log: false

For using UDP, we added the following below device:

stream_device:
  udp:
    ip_server: "IPS1"
    port: 28785
    unicast_ip: "192.168.140.100"
thomasemter commented 8 months ago

Thank you for your thorough testing and sorry you are still experiencing this issue. I am still a little clueless what could be the culprit.

Just to make sure: By latest version of the driver, do you mean 1.3.1?

abaeyens-imod commented 8 months ago

Just to make sure: By latest version of the driver, do you mean 1.3.1?

Probably, the commit log shows v1.3.1, HEAD commit is 207c37bc.

  • Can you access the web interface and check the CPU load of the AsteRx? AsteRx receiver CPU load varies between 46 and 65 %.
  • Did you also try without the router, i.e., direct ethernet connection?

Good idea, just tried this (static IPs on both and wifi off). Now UDP does work most of the time and when it works, messages are published every 10 +- 1 ms. However, TCP performs worse than before, there are still large gaps and per second only 90...99 messages are received instead of 100 (see ros2 topic hz output below).

  • Is it possible for you to test with USB via RNDIS, i.e., connect the AsteRx with USB and set device: tcp://192.168.3.1:28784?

I'll look whether we have a cable for that.

  • Could you test with just one topic, e.g., insnavgeod: true and the rest false (especially gpsfix set to false) and log the output of ros2 topic hz /insnavgeod?

In TCP mode:

arne@laptop:~/ros2$ ros2 topic list
/insnavgeod
/parameter_events
/rosout
/tf
/tf_static
arne@laptop:~/ros2$ ros2 topic hz /insnavgeod
average rate: 95.847
    min: 0.002s max: 0.037s std dev: 0.00406s window: 97
average rate: 95.947
    min: 0.001s max: 0.043s std dev: 0.00426s window: 194
average rate: 95.656
    min: 0.001s max: 0.051s std dev: 0.00451s window: 290
average rate: 95.752
    min: 0.001s max: 0.051s std dev: 0.00437s window: 387
average rate: 95.800
    min: 0.001s max: 0.051s std dev: 0.00429s window: 483
average rate: 95.671
    min: 0.001s max: 0.051s std dev: 0.00428s window: 579
average rate: 95.299
    min: 0.001s max: 0.076s std dev: 0.00484s window: 673
average rate: 95.386
    min: 0.001s max: 0.076s std dev: 0.00483s window: 769
average rate: 95.359
    min: 0.001s max: 0.076s std dev: 0.00475s window: 865

So intervals range from 1 to 76 ms. The header timestamps show similar intervals.

Here's an excerpt from the TCP package log from Wireshark, with large intervals between packets 496-497 and 521-522 (packets 497 and 522 are also considerably longer):

No. Time    Source      Destination ProtocolLength  Info
492 3.330   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=38629 Ack=5 Win=1810 Len=116 TSval=706902 TSecr=3783558735
493 3.330   192.168.140.81  192.168.140.80  TCP 66  53198 → 28784 [ACK] Seq=5 Ack=38745 Win=501 Len=0 TSval=3783558755 TSecr=706901
494 3.340   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=38745 Ack=5 Win=1810 Len=116 TSval=706903 TSecr=3783558755
495 3.351   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=38861 Ack=5 Win=1810 Len=116 TSval=706904 TSecr=3783558755
496 3.351   192.168.140.81  192.168.140.80  TCP 66  53198 → 28784 [ACK] Seq=5 Ack=38977 Win=501 Len=0 TSval=3783558776 TSecr=706903
497 3.398   192.168.140.80  192.168.140.81  TCP 530 28784 → 53198 [PSH, ACK] Seq=38977 Ack=5 Win=1810 Len=464 TSval=706909 TSecr=3783558755
498 3.402   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=39441 Ack=5 Win=1810 Len=116 TSval=706909 TSecr=3783558776
499 3.402   192.168.140.81  192.168.140.80  TCP 66  53198 → 28784 [ACK] Seq=5 Ack=39557 Win=501 Len=0 TSval=3783558827 TSecr=706909
500 3.410   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=39557 Ack=5 Win=1810 Len=116 TSval=706910 TSecr=3783558827
501 3.420   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=39673 Ack=5 Win=1810 Len=116 TSval=706911 TSecr=3783558827
502 3.420   192.168.140.81  192.168.140.80  TCP 66  53198 → 28784 [ACK] Seq=5 Ack=39789 Win=501 Len=0 TSval=3783558845 TSecr=706910
503 3.430   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=39789 Ack=5 Win=1810 Len=116 TSval=706912 TSecr=3783558845
504 3.440   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=39905 Ack=5 Win=1810 Len=116 TSval=706913 TSecr=3783558845
505 3.440   192.168.140.81  192.168.140.80  TCP 66  53198 → 28784 [ACK] Seq=5 Ack=40021 Win=501 Len=0 TSval=3783558865 TSecr=706912
506 3.450   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=40021 Ack=5 Win=1810 Len=116 TSval=706914 TSecr=3783558865
507 3.460   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=40137 Ack=5 Win=1810 Len=116 TSval=706915 TSecr=3783558865
508 3.461   192.168.140.81  192.168.140.80  TCP 66  53198 → 28784 [ACK] Seq=5 Ack=40253 Win=501 Len=0 TSval=3783558886 TSecr=706914
509 3.470   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=40253 Ack=5 Win=1810 Len=116 TSval=706916 TSecr=3783558886
510 3.480   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=40369 Ack=5 Win=1810 Len=116 TSval=706917 TSecr=3783558886
511 3.480   192.168.140.81  192.168.140.80  TCP 66  53198 → 28784 [ACK] Seq=5 Ack=40485 Win=501 Len=0 TSval=3783558905 TSecr=706916
512 3.490   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=40485 Ack=5 Win=1810 Len=116 TSval=706918 TSecr=3783558905
513 3.500   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=40601 Ack=5 Win=1810 Len=116 TSval=706919 TSecr=3783558905
514 3.500   192.168.140.81  192.168.140.80  TCP 66  53198 → 28784 [ACK] Seq=5 Ack=40717 Win=501 Len=0 TSval=3783558925 TSecr=706918
515 3.510   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=40717 Ack=5 Win=1810 Len=116 TSval=706920 TSecr=3783558925
516 3.520   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=40833 Ack=5 Win=1810 Len=116 TSval=706921 TSecr=3783558925
517 3.520   192.168.140.81  192.168.140.80  TCP 66  53198 → 28784 [ACK] Seq=5 Ack=40949 Win=501 Len=0 TSval=3783558945 TSecr=706920
518 3.530   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=40949 Ack=5 Win=1810 Len=116 TSval=706922 TSecr=3783558945
519 3.540   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=41065 Ack=5 Win=1810 Len=116 TSval=706923 TSecr=3783558945
520 3.540   192.168.140.81  192.168.140.80  TCP 66  53198 → 28784 [ACK] Seq=5 Ack=41181 Win=501 Len=0 TSval=3783558965 TSecr=706922
521 3.550   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=41181 Ack=5 Win=1810 Len=116 TSval=706924 TSecr=3783558965
522 3.583   192.168.140.80  192.168.140.81  TCP 414 28784 → 53198 [PSH, ACK] Seq=41297 Ack=5 Win=1810 Len=348 TSval=706927 TSecr=3783558965
523 3.583   192.168.140.81  192.168.140.80  TCP 66  53198 → 28784 [ACK] Seq=5 Ack=41645 Win=501 Len=0 TSval=3783559008 TSecr=706924
524 3.590   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=41645 Ack=5 Win=1810 Len=116 TSval=706928 TSecr=3783559008
525 3.600   192.168.140.80  192.168.140.81  TCP 182 28784 → 53198 [PSH, ACK] Seq=41761 Ack=5 Win=1810 Len=116 TSval=706929 TSecr=3783559008
526 3.600   192.168.140.81  192.168.140.80  TCP 66  53198 → 28784 [ACK] Seq=5 Ack=41877 Win=501 Len=0 TSval=3783559025 TSecr=706928
thomasemter commented 8 months ago

Thank you for the details.

I just tested it with my SBi with TCP via RNDIS and ros2 topic hz /insnavgeod gives

average rate: 100.000
    min: 0.008s max: 0.012s std dev: 0.00041s window: 10000
average rate: 100.000
    min: 0.008s max: 0.012s std dev: 0.00041s window: 10000
average rate: 100.000
    min: 0.008s max: 0.012s std dev: 0.00041s window: 10000
average rate: 99.999
    min: 0.008s max: 0.012s std dev: 0.00041s window: 10000
average rate: 100.000
    min: 0.008s max: 0.012s std dev: 0.00041s window: 10000
average rate: 100.000
    min: 0.008s max: 0.012s std dev: 0.00041s window: 10000
average rate: 100.000
    min: 0.008s max: 0.012s std dev: 0.00041s window: 10000
average rate: 99.999
    min: 0.008s max: 0.012s std dev: 0.00040s window: 10000
average rate: 100.000
    min: 0.008s max: 0.012s std dev: 0.00040s window: 10000
average rate: 100.001
    min: 0.008s max: 0.012s std dev: 0.00040s window: 10000
average rate: 99.999
    min: 0.008s max: 0.012s std dev: 0.00040s window: 10000

with nearly all publishers activated. With a real ethernet connection max is 0.039 s which is much worse but still not as high as you are experiencing. With UDP both are within 10 +/- 2 ms. Since the latency jitter for TCP is much lower via RNDIS, I do not think the problem lies in the driver or ROS.

thomasemter commented 8 months ago

I have consulted with Septentrio's support and there is indeed a higher latency when using TCP via real ethernet. It is caused by the implementation in the lower level of the firmware. An improvement is being worked on, which will likely be incorporated in the next AsteRx SBi3 firmware release. For lowest latency operation it is still recommended to go with UDP.

abaeyens-imod commented 8 months ago

Great that you got in touch with their support, looking forward to the next firmware release!

We experimented with UDP several times, sometimes it works, however overall it doesn't perform sufficiently reliably, so we prefer TCP.

thomasemter commented 2 months ago

A new firmware 1.4.3 has been released, which improves the TCP/IP latency.