mrpt_localization repeatability on arm64

67bug commented 1 year ago

I am evaluating the mrpt_localization package and have run into a strange situation with repeatability.

Every so often, mrpt_localization will result in a disturbingly different localization result than other times. Of course, the "different" result is wrong and occurs when we are running live in our test environment (Murphy still seems to have his way) and hence the issue. So we built up an offline setup where we can see the impact of parameter changes, repeated runs and such. This appears to happen only on arm64 and not on amd64.

Environment:

Ros Melodic Morenia
Ubuntu 18.04
mrpt_localization: 1.0.4 from source and also 1.0.3 from ros-distro
load static map generated outside of mrpt (a simple PGM)

I'll be happy to share the map pgm, mrpt_localization parameters and such to anyone interested:

Some screenshots from various runs are included below. Legend:

The green outline is the footprint of the vehicle when the original bag file was generated (it was incorrectly localized then)
The trailing red arrows are the odometry lines as localized in an offline repeatability test run. You also see some of the various sensor tfs if you look closely
the red dots are the lidar readings in the localized tf tree (this is how I tell if the localization is good or not) and should match up roughly to the map
the yellow dots are the particle cloud from mrpt from the current localization

Correct runs:

Every now and then (roughly 10% of the time), incorrect runs:

How would one go about figuring out where to look to try to look for a root cause? The only (arguably unrelated issue) i can find that talks about arm64 and amd64 differences is this one.

Here is the mrpt_config.ini file

#------------------------------------------------------
# Config file for the application PF Localization
# See: https://www.mrpt.org/list-of-mrpt-apps/application-pf-localization/
#------------------------------------------------------

#---------------------------------------------------------------------------
# Section: [KLD_options]
# Use: Options for the adaptive sample size KLD-algorithm
# Refer to paper:
# D. Fox, W. Burgard, F. Dellaert, and S. Thrun, "Monte Carlo localization:
# Efficient position estimation for mobile robots," Proc. of the
# National Conference on Artificial Intelligence (AAAI),v.113, p.114,1999.
#---------------------------------------------------------------------------
[KLD_options]
KLD_binSize_PHI_deg=10
KLD_binSize_XY=0.10
KLD_delta=0.01
KLD_epsilon=0.01
KLD_maxSampleSize=40000
KLD_minSampleSize=150
KLD_minSamplesPerBin=0   

#---------------------------------------------------------------------------
# Section: [PF_options]
# Use: The parameters for the PF algorithms
#---------------------------------------------------------------------------
[PF_options]
# The Particle Filter algorithm:
#   0: pfStandardProposal     ***
#   1: pfAuxiliaryPFStandard
#   2: pfOptimalProposal    
#   3: pfAuxiliaryPFOptimal   ***
#
PF_algorithm=0

# The Particle Filter Resampling method:
#   0: prMultinomial
#   1: prResidual
#   2: prStratified
#   3: prSystematic
resamplingMethod=0

# Set to 1 to enable KLD adaptive sample size:
adaptiveSampleSize=1

# Only for algorithm=3 (pfAuxiliaryPFOptimal)
pfAuxFilterOptimal_MaximumSearchSamples=10

# Resampling threshold
BETA=0.5

# Number of particles (IGNORED IN THIS APPLICATION, SUPERSEDED BY "particles_count" below)
sampleSize=1

#---------------------------------------------------------------------------
# Default "noise" parameters for odometry in observations-only rawlog formats
#---------------------------------------------------------------------------
[DummyOdometryParams]
minStdXY     = 0.10    // (meters)
minStdPHI    = 2.0     // (degrees)

#---------------------------------------------------------------------------
# Section: [LocalizationExperiment]
# Use: Here come global parameters for the app.
#---------------------------------------------------------------------------
[LocalizationExperiment]

use_3D_poses = false

# The map in the ".simplemap" format or just a ".gridmap" (the program detects the file extension)
# This map is used to localize the robot within it:
map_file=

# The source file (RAW-LOG) with action/observation pairs
rawlog_file=

# The directory where the log files will be saved (left in blank if no log is desired)
logOutput_dir=LOG_LOCALIZATION

# Freq. of 3D scene log
3DSceneFrequency=1

# The repetitions of the experiments (each one will go to a different 
# directory with the index suffix)
experimentRepetitions=1

# Initial number of particles (if dynamic sample size is enabled, the population may change afterwards).
#  You can put an array, e.g. "100 200 300", to run the experiment with different number of initial samples:
particles_count=40000

# 1: Uniform distribution over the range, 0: Uniform distribution over the free cells of the gridmap in the range:
init_PDF_mode=0
init_PDF_min_x=-1
init_PDF_max_x=1
init_PDF_min_y=-1
init_PDF_max_y=1

SHOW_PROGRESS_3D_REAL_TIME  = true

# ====================================================
#
#            MULTIMETRIC MAP CONFIGURATION
#
# ====================================================
[MetricMap]
# Creation of maps:
occupancyGrid_count=1
gasGrid_count=0
landmarksMap_count=0
pointsMap_count=0
beaconMap_count=0

# Selection of map for likelihood: (fuseAll=-1,occGrid=0, points=1,landmarks=2,gasGrid=3)
likelihoodMapSelection=-1

# Enables (1) / Disables (0) insertion into specific maps:
enableInsertion_pointsMap=1
enableInsertion_landmarksMap=1
enableInsertion_gridMaps=1
enableInsertion_gasGridMaps=1
enableInsertion_beaconMap=1

# ====================================================
#   MULTIMETRIC MAP: OccGrid #00
# ====================================================
# Creation Options for OccupancyGridMap 00:
[MetricMap_occupancyGrid_00_creationOpts]
resolution=0.06

# Insertion Options for OccupancyGridMap 00:
[MetricMap_occupancyGrid_00_insertOpts]
mapAltitude=0
useMapAltitude=0
maxDistanceInsertion=15
maxOccupancyUpdateCertainty=0.55
considerInvalidRangesAsFreeSpace=1
minLaserScanNoiseStd=0.001

# Likelihood Options for OccupancyGridMap 00:
[MetricMap_occupancyGrid_00_likelihoodOpts]
likelihoodMethod=4      // 0=MI, 1=Beam Model, 2=RSLC, 3=Cells Difs, 4=LF_Trun, 5=LF_II

LF_decimation=20
LF_stdHit=0.20
LF_maxCorrsDistance=5.0
LF_zHit=0.95
LF_zRandom=0.05
LF_maxRange=80
LF_alternateAverageMethod=0

MI_exponent=10
MI_skip_rays=10
MI_ratio_max_distance=2

rayTracing_useDistanceFilter=0
rayTracing_decimation=10
rayTracing_stdHit=0.30

consensus_takeEachRange=30
consensus_pow=1

and the launch file calls out these parameter values:

<param name="default_noise_xy" value="0.1"/>
<param name="default_noise_phi" value="0.5"/>
<param name="gaussian_alpha_xy" value="0.005"/>
 <param name="gaussian_alpha_phi" value="0.1"/>

If it helps any, if we use the default_noise_phi value of 2.0 (which is the default), the localization is always incorrect -- this is how we had captured the original bag file. If we set it to 0.5, localization is clearly better, but runs into this repeatability issue

@maxbader, we could use some of your guidance here

jlblancoc commented 1 year ago

Interesting... First, I would investigate if it's actually related to the architecture at all (arm64 vs amd64).... have you tried it with a regular desktop computer/laptop and it works nearly 100% of the times with those same parameters?

It would be helpful if you could share (dropbox/google drive/...) a ZIP with everything needed to reproduce: launch and config files + rosbag + launch instructions.

My feeling is that it's all related to tuning the uncertainty parameters of odometry. If failures are always near a curve, odometry normally is bad at those points, and we need either a larger uncertainty for rotations in the motion model, or a larger number of particles.

Another direct experiments you can try are:

KLD_minSampleSize=150 ==> try larger values for the minimum number of particles, e.g. 300, 400, 500.
LF_decimation=20 ==> try smaller values, e.g. 15, 10.

Most likely you will fix it with the number of particles, if it runs OK 90% of the time. If it always fail on curves, then updating uncertainty parameters should be required.

67bug commented 1 year ago

Hi Jose,

Thanks so much for your note. This is super helpful.

First things first, yes, this was happening only on the arm64 platform (a Jetson Xavier NX). For a tally of my own runs:

Jetson NX: 38 runs, 5 fails (with the settings shown above)
x86 Laptop: ~20 runs, 0 fails (with the settings shown above)

Clearly, the dataset itself was a bit biased, and I had assumed that the law of large numbers would suffice to draw reasonable statistics. So I decided to increase the x86 run count and lo and behold, I got two fails in the first six runs -- enough to eliminate my incorrect claim regarding differences between the arm64/amd64 platforms. [Please let me know if you would like me to change the title of the issue for future observers of this repo]

So I ran a tiny DOE to look for sensitivities:

DOE 1 (10 replicates each): Change KLD_minSampleSize
- Current value of 150
- 300
- 400
DOE 2 (10 replicates each): Change LF_Decimation
- Current value of 20
- 15
- 10
Trial 3 (30 replicates each): take the "best" of DOEs 1 and 2.

On the x86, where there was some variation from run to run, a combination of KLD_minSampleSize of 400 and LF_Decimation of 15 appears to be quite repeatable and accurate. However, the sensitivity to KLD_minSampleSize was quite low (between 150, 300 and 400, there was not much variation). I currently don't have a measurable means of comparing the replicates: the judgment is entirely visual. We need to come up with some means of quantifying the performance.

I repeated these 30 replicates on the Jetson Xavier with KLD_minSampleSize 150, 300 and 400 and LF_Decimation 15 and had zero errors.

One observation, the initial position (controlled by init_PDF parameters) seems to be perhaps the biggest variable in terms of localization errors when there is no motion at the beginning. Here are some screenshots: Good:

Not so good:

This error gets corrected within a few seconds of motion.

That said, taking a step back, the errors indeed are primarily when turns are made and highly exacerbated when sudden turns are made (to avoid dynamic obstacles). A factorial approach as I used above is painful at best and clearly, I am running somewhat blind. I'll take a look at your latest set of links in #125. Thank you!

67bug commented 1 year ago

Closing this as this is not an issue any more. Thanks for your help, @jlblancoc !

mrpt-ros-pkg / mrpt_navigation

mrpt_localization repeatability on arm64 #133