wpilibsuite / allwpilib

Official Repository of WPILibJ and WPILibC
https://wpilib.org/
Other
1.04k stars 605 forks source link

SwerveDrivePoseEstimator#addVisionMeasurement throws a ConcurrentModificationException #4952

Open varun7654 opened 1 year ago

varun7654 commented 1 year ago

Describe the bug Calling SwerveDrivePoseEstimator#addVisionMeasurement with a time > 1.5s earlier than the last time used in the #update() method will cause a ConcurentModificationExcpetion

To Reproduce Running this test will reproduce the crash: https://github.com/varun7654/PoseEstimatorTest/blob/main/src/test/java/frc/robot/PoseEstimatorTest.java

Steps to reproduce the behavior:

  1. Add enough measurements to fill up the Pose Buffer in the SwerveDrivePoseEstimator
  2. Add Pose estimate using the #addVisionMeasurement with a time > 1.5s earlier than the one last used in #update()

Expected behavior The SwerveDrivePoseEstimator will not produce a concurrent modification exception

Screenshots Stack trace:

java.util.ConcurrentModificationException
    at java.base/java.util.TreeMap$NavigableSubMap$SubMapIterator.nextEntry(TreeMap.java:2016)
    at java.base/java.util.TreeMap$NavigableSubMap$SubMapEntryIterator.next(TreeMap.java:2064)
    at java.base/java.util.TreeMap$NavigableSubMap$SubMapEntryIterator.next(TreeMap.java:2058)
    at edu.wpi.first.math.estimator.SwerveDrivePoseEstimator.addVisionMeasurement(SwerveDrivePoseEstimator.java:213)
    at edu.wpi.first.math.estimator.SwerveDrivePoseEstimator.addVisionMeasurement(SwerveDrivePoseEstimator.java:250)
    at frc.robot.PoseEstimatorTest.testPoseEstimator(PoseEstimatorTest.java:55)

...

Desktop (please complete the following information):

Additional context In the addVisionMeasurement method, the updateWithTime method is called inside an iterator to replay all the updates that occurred after the vision measurement.

    // Step 7: Replay odometry inputs between sample time and latest recorded sample to update the
    // pose buffer and correct odometry.
    for (Map.Entry<Double, InterpolationRecord> entry :
        m_poseBuffer.getInternalBuffer().tailMap(timestampSeconds).entrySet()) {
      updateWithTime(entry.getKey(), entry.getValue().gyroAngle, entry.getValue().modulePositions);
    }

This would typically be fine, except that the cleanUp() method is called during each call to m_poseBuffer.addSample()

    m_poseBuffer.addSample(
        currentTimeSeconds,
        new InterpolationRecord(getEstimatedPosition(), gyroAngle, internalModulePositions));

In TimeInterpolationBuffer:

  public void addSample(double timeSeconds, T sample) {
    cleanUp(timeSeconds);
    m_pastSnapshots.put(timeSeconds, sample);
  }

... which can delete an entry


  /**
   * Removes samples older than our current history size.
   *
   * @param time The current timestamp.
   */
  private void cleanUp(double time) {
    while (!m_pastSnapshots.isEmpty()) {
      var entry = m_pastSnapshots.firstEntry();
      if (time - entry.getKey() >= m_historySize) {
        m_pastSnapshots.remove(entry.getKey());
      } else {
        return;
      }
    }
  }

... and since #4917 now adds a sample for the pose after it's been updated by vision, this pose could potentially cause CME if the vision measurement is old enough.

bobbbay commented 1 year ago

+1, have this too.

calcmogul commented 1 year ago

@jlmcmchl 1.5 s is the length of the pose buffer, so it looks like the pose estimator doesn't properly handle samples from before that (namely, it should drop them on the floor instead of crashing).

bobbbay commented 1 year ago

Would this be solved by #4959? It seems the PR removes the offending block of code.

varun7654 commented 1 year ago

The concurrent modification exception appears to be fixed, but it still looks like we'll get incorrect behavior with a timestamp older than 1.5s.

It appears that TimeInterpolatableBuffer#getSample(...) will simply return the oldest sample when the time requested is older than the oldest sample it has. I'd suggest returning Optional.empty() in this case in TimeInterpolatableBuffer#getSample(...) b/c it doesn't make sense to return a sample for a position that could be nowhere near the time requested.

Or alternatively, we could add a check for the time in #addVisonMeasurment(...) to fix this

bobbbay commented 1 year ago

Using the Optional type makes the most sense to me, because it defers the checking to the user, and I can handle this situation any way I want. I'm not sure what the API guidelines are here, but maybe a DS warning could be emitted too.

calcmogul commented 1 year ago

Yea, just have TimeInterpolatableBuffer.getSample() return an Optional. A warning for vision data being too old can go in the pose estimator class.

varun7654 commented 1 year ago

TimeInterpolatableBuffer.getSample() already returns an Optional btw. It's only empty, though, when there are no samples in the buffer!

bobbbay commented 1 year ago

In other words, TimeInterpolatableBuffer#getSample should only return Optional.empty() when the requested sample is impossible to calculate, e.g. when there are no buffers or when the requested sample is out of range.

Now that I think of it though, isn't returning the closest point when out of range part of the definition of interpolation? Then (once 4959 is merged) by definition, this issue does not exist.

calcmogul commented 1 year ago

TimeInterpolatableBuffer#getSample should only return Optional.empty() when the requested sample is impossible to calculate, e.g. when there are no buffers or when the requested sample is out of range.

I agree. We shouldn't just return the oldest sample if it isn't the expected timestamp.

Rubyboat1207 commented 5 months ago

Fix did not work for me. I get an immediate crash due to concurrent modification exception still. Should be reopened.

jlmcmchl commented 5 months ago

Can you share example code of how you're still getting a concurrent modification exception?

Rubyboat1207 commented 5 months ago

I can share tuesday (2/20). Though its not a very complex case, right now I am iterating through a list of pose estimators and then adding their pose from the PhotonPoseEstimator#update() function using SwerveDrivePoseEstimator#addVisionMeasurement(). I have the logs for now though:

 Unhandled exception: java.util.ConcurrentModificationException
 Error at java.base/java.util.TreeMap$NavigableSubMap$SubMapIterator.nextEntry(Unknown Source): Unhandled exception: java.util.ConcurrentModificationException
 at java.base/java.util.TreeMap$NavigableSubMap$SubMapIterator.nextEntry(Unknown Source)
 at java.base/java.util.TreeMap$NavigableSubMap$SubMapEntryIterator.next(Unknown Source)
 at java.base/java.util.TreeMap$NavigableSubMap$SubMapEntryIterator.next(Unknown Source)
 at edu.wpi.first.math.estimator.PoseEstimator.addVisionMeasurement(PoseEstimator.java:191)
 at frc.robot.subsystems.Vision.updateEstimatedPose(Vision.java:47)
 at frc.robot.subsystems.PheonixDrivebase.periodic(PheonixDrivebase.java:55)
 at edu.wpi.first.wpilibj2.command.CommandScheduler.run(CommandScheduler.java:260)
 at frc.robot.Robot.robotPeriodic(Robot.java:61)
 at edu.wpi.first.wpilibj.IterativeRobotBase.loopFunc(IterativeRobotBase.java:381)
 at edu.wpi.first.wpilibj.TimedRobot.startCompetition(TimedRobot.java:131)
 at edu.wpi.first.wpilibj.RobotBase.runRobot(RobotBase.java:365)
 at edu.wpi.first.wpilibj.RobotBase.startRobot(RobotBase.java:453)
 at frc.robot.Main.main(Main.java:27)```

Ok here is the code:

public void updateEstimatedPose(SwerveDrivePoseEstimator estimator) { Pose2d prevEstimatedRobotPose = estimator.getEstimatedPosition();

for(int i = 0; i < cameras.size(); i++) {
  var cam = cameras.get(i);
  cam.poseEstimator.setReferencePose(prevEstimatedRobotPose);
  Optional<EstimatedRobotPose> est = cam.poseEstimator.update();
  if(est.isPresent()) {
    var estimate = est.get();
    System.out.println("Present");

    estimator.addVisionMeasurement(estimate.estimatedPose.toPose2d(), estimate.timestampSeconds);
  }
}

}``` https://github.com/frc5431/RobotCode2024/blob/main/src/main/java/frc/robot/subsystems/Vision.java

if you look here, i tried to edit some of the classes so that i could attempt a fix without getting a new version. There were too many updates and I didnt feel like making them, so i stopped, ill clean it up eventually.

Rubyboat1207 commented 5 months ago

I have 2 proposed fixes - the first, although its a bit crude, is very simple. It unprivates the cleanUp method and runs it before the loop. I have another solution that involves deferring the cleanup if it fails, but i haven't had the chance to test either on bot, which i will tomorrow.

crude fix: https://github.com/Rubyboat1207/allwpilib/commit/4b38ec9b96f11b1375f8aa680e09337a011659f0 defer fix: https://github.com/Rubyboat1207/allwpilib/commit/b1842242251fa9567a002300d358cfda71a094f7