mrpt-ros-pkg / mrpt_navigation

ROS 2 nodes wrapping core MRPT functionality: localization, autonomous navigation, rawlogs, etc. SLAM is in other packages.
http://wiki.ros.org/mrpt_navigation
BSD 3-Clause "New" or "Revised" License
174 stars 95 forks source link

mrpt_localization repeatability on arm64 #133

Closed 67bug closed 1 year ago

67bug commented 1 year ago

I am evaluating the mrpt_localization package and have run into a strange situation with repeatability.

Every so often, mrpt_localization will result in a disturbingly different localization result than other times. Of course, the "different" result is wrong and occurs when we are running live in our test environment (Murphy still seems to have his way) and hence the issue. So we built up an offline setup where we can see the impact of parameter changes, repeated runs and such. This appears to happen only on arm64 and not on amd64.

Environment:

I'll be happy to share the map pgm, mrpt_localization parameters and such to anyone interested:

Some screenshots from various runs are included below. Legend:

Correct runs: image

Every now and then (roughly 10% of the time), incorrect runs:

image image image

How would one go about figuring out where to look to try to look for a root cause? The only (arguably unrelated issue) i can find that talks about arm64 and amd64 differences is this one.

Here is the mrpt_config.ini file

#------------------------------------------------------
# Config file for the application PF Localization
# See: https://www.mrpt.org/list-of-mrpt-apps/application-pf-localization/
#------------------------------------------------------

#---------------------------------------------------------------------------
# Section: [KLD_options]
# Use: Options for the adaptive sample size KLD-algorithm
# Refer to paper:
# D. Fox, W. Burgard, F. Dellaert, and S. Thrun, "Monte Carlo localization:
# Efficient position estimation for mobile robots," Proc. of the
# National Conference on Artificial Intelligence (AAAI),v.113, p.114,1999.
#---------------------------------------------------------------------------
[KLD_options]
KLD_binSize_PHI_deg=10
KLD_binSize_XY=0.10
KLD_delta=0.01
KLD_epsilon=0.01
KLD_maxSampleSize=40000
KLD_minSampleSize=150
KLD_minSamplesPerBin=0   

#---------------------------------------------------------------------------
# Section: [PF_options]
# Use: The parameters for the PF algorithms
#---------------------------------------------------------------------------
[PF_options]
# The Particle Filter algorithm:
#   0: pfStandardProposal     ***
#   1: pfAuxiliaryPFStandard
#   2: pfOptimalProposal    
#   3: pfAuxiliaryPFOptimal   ***
#
PF_algorithm=0

# The Particle Filter Resampling method:
#   0: prMultinomial
#   1: prResidual
#   2: prStratified
#   3: prSystematic
resamplingMethod=0

# Set to 1 to enable KLD adaptive sample size:
adaptiveSampleSize=1

# Only for algorithm=3 (pfAuxiliaryPFOptimal)
pfAuxFilterOptimal_MaximumSearchSamples=10

# Resampling threshold
BETA=0.5

# Number of particles (IGNORED IN THIS APPLICATION, SUPERSEDED BY "particles_count" below)
sampleSize=1

#---------------------------------------------------------------------------
# Default "noise" parameters for odometry in observations-only rawlog formats
#---------------------------------------------------------------------------
[DummyOdometryParams]
minStdXY     = 0.10    // (meters)
minStdPHI    = 2.0     // (degrees)

#---------------------------------------------------------------------------
# Section: [LocalizationExperiment]
# Use: Here come global parameters for the app.
#---------------------------------------------------------------------------
[LocalizationExperiment]

use_3D_poses = false

# The map in the ".simplemap" format or just a ".gridmap" (the program detects the file extension)
# This map is used to localize the robot within it:
map_file=

# The source file (RAW-LOG) with action/observation pairs
rawlog_file=

# The directory where the log files will be saved (left in blank if no log is desired)
logOutput_dir=LOG_LOCALIZATION

# Freq. of 3D scene log
3DSceneFrequency=1

# The repetitions of the experiments (each one will go to a different 
# directory with the index suffix)
experimentRepetitions=1

# Initial number of particles (if dynamic sample size is enabled, the population may change afterwards).
#  You can put an array, e.g. "100 200 300", to run the experiment with different number of initial samples:
particles_count=40000

# 1: Uniform distribution over the range, 0: Uniform distribution over the free cells of the gridmap in the range:
init_PDF_mode=0
init_PDF_min_x=-1
init_PDF_max_x=1
init_PDF_min_y=-1
init_PDF_max_y=1

SHOW_PROGRESS_3D_REAL_TIME  = true

# ====================================================
#
#            MULTIMETRIC MAP CONFIGURATION
#
# ====================================================
[MetricMap]
# Creation of maps:
occupancyGrid_count=1
gasGrid_count=0
landmarksMap_count=0
pointsMap_count=0
beaconMap_count=0

# Selection of map for likelihood: (fuseAll=-1,occGrid=0, points=1,landmarks=2,gasGrid=3)
likelihoodMapSelection=-1

# Enables (1) / Disables (0) insertion into specific maps:
enableInsertion_pointsMap=1
enableInsertion_landmarksMap=1
enableInsertion_gridMaps=1
enableInsertion_gasGridMaps=1
enableInsertion_beaconMap=1

# ====================================================
#   MULTIMETRIC MAP: OccGrid #00
# ====================================================
# Creation Options for OccupancyGridMap 00:
[MetricMap_occupancyGrid_00_creationOpts]
resolution=0.06

# Insertion Options for OccupancyGridMap 00:
[MetricMap_occupancyGrid_00_insertOpts]
mapAltitude=0
useMapAltitude=0
maxDistanceInsertion=15
maxOccupancyUpdateCertainty=0.55
considerInvalidRangesAsFreeSpace=1
minLaserScanNoiseStd=0.001

# Likelihood Options for OccupancyGridMap 00:
[MetricMap_occupancyGrid_00_likelihoodOpts]
likelihoodMethod=4      // 0=MI, 1=Beam Model, 2=RSLC, 3=Cells Difs, 4=LF_Trun, 5=LF_II

LF_decimation=20
LF_stdHit=0.20
LF_maxCorrsDistance=5.0
LF_zHit=0.95
LF_zRandom=0.05
LF_maxRange=80
LF_alternateAverageMethod=0

MI_exponent=10
MI_skip_rays=10
MI_ratio_max_distance=2

rayTracing_useDistanceFilter=0
rayTracing_decimation=10
rayTracing_stdHit=0.30

consensus_takeEachRange=30
consensus_pow=1

and the launch file calls out these parameter values:

<param name="default_noise_xy" value="0.1"/>
<param name="default_noise_phi" value="0.5"/>
<param name="gaussian_alpha_xy" value="0.005"/>
 <param name="gaussian_alpha_phi" value="0.1"/>

If it helps any, if we use the default_noise_phi value of 2.0 (which is the default), the localization is always incorrect -- this is how we had captured the original bag file. If we set it to 0.5, localization is clearly better, but runs into this repeatability issue

@maxbader, we could use some of your guidance here

jlblancoc commented 1 year ago

Interesting... First, I would investigate if it's actually related to the architecture at all (arm64 vs amd64).... have you tried it with a regular desktop computer/laptop and it works nearly 100% of the times with those same parameters?

It would be helpful if you could share (dropbox/google drive/...) a ZIP with everything needed to reproduce: launch and config files + rosbag + launch instructions.

My feeling is that it's all related to tuning the uncertainty parameters of odometry. If failures are always near a curve, odometry normally is bad at those points, and we need either a larger uncertainty for rotations in the motion model, or a larger number of particles.

Another direct experiments you can try are:

Most likely you will fix it with the number of particles, if it runs OK 90% of the time. If it always fail on curves, then updating uncertainty parameters should be required.

67bug commented 1 year ago

Hi Jose,

Thanks so much for your note. This is super helpful.

First things first, yes, this was happening only on the arm64 platform (a Jetson Xavier NX). For a tally of my own runs:

Clearly, the dataset itself was a bit biased, and I had assumed that the law of large numbers would suffice to draw reasonable statistics. So I decided to increase the x86 run count and lo and behold, I got two fails in the first six runs -- enough to eliminate my incorrect claim regarding differences between the arm64/amd64 platforms. [Please let me know if you would like me to change the title of the issue for future observers of this repo]

So I ran a tiny DOE to look for sensitivities:

  1. DOE 1 (10 replicates each): Change KLD_minSampleSize

    • Current value of 150
    • 300
    • 400
  2. DOE 2 (10 replicates each): Change LF_Decimation

    • Current value of 20
    • 15
    • 10
  3. Trial 3 (30 replicates each): take the "best" of DOEs 1 and 2.

On the x86, where there was some variation from run to run, a combination of KLD_minSampleSize of 400 and LF_Decimation of 15 appears to be quite repeatable and accurate. However, the sensitivity to KLD_minSampleSize was quite low (between 150, 300 and 400, there was not much variation). I currently don't have a measurable means of comparing the replicates: the judgment is entirely visual. We need to come up with some means of quantifying the performance.

I repeated these 30 replicates on the Jetson Xavier with KLD_minSampleSize 150, 300 and 400 and LF_Decimation 15 and had zero errors.

One observation, the initial position (controlled by init_PDF parameters) seems to be perhaps the biggest variable in terms of localization errors when there is no motion at the beginning. Here are some screenshots: Good: image

Not so good: image

This error gets corrected within a few seconds of motion.

That said, taking a step back, the errors indeed are primarily when turns are made and highly exacerbated when sudden turns are made (to avoid dynamic obstacles). A factorial approach as I used above is painful at best and clearly, I am running somewhat blind. I'll take a look at your latest set of links in #125. Thank you!

67bug commented 1 year ago

Closing this as this is not an issue any more. Thanks for your help, @jlblancoc !