wpilibsuite / allwpilib

Official Repository of WPILibJ and WPILibC
https://wpilib.org/
Other
1.04k stars 606 forks source link

`SimHooks.stepTiming()` hangs when called from a callback registered with `SimDeviceSim.registerValueChangedCallback()` if another `SimDeviceSim` has been created #6641

Open brettle opened 1 month ago

brettle commented 1 month ago

Describe the bug SimHooks.stepTiming() hangs when called from a callback registered with SimDeviceSim.registerValueChangedCallback() if another SimDeviceSim has been created.

To Reproduce Run the following test (which uses an ADXRS450_GyroSim as an example of a SimDeviceSim):

package frc.robot;

import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;

import edu.wpi.first.hal.HAL;
import edu.wpi.first.hal.SimDevice;
import edu.wpi.first.hal.SimDevice.Direction;
import edu.wpi.first.wpilibj.ADXRS450_Gyro;
import edu.wpi.first.wpilibj.TimedRobot;
import edu.wpi.first.wpilibj.Timer;
import edu.wpi.first.wpilibj.simulation.ADXRS450_GyroSim;
import edu.wpi.first.wpilibj.simulation.SimDeviceSim;
import edu.wpi.first.wpilibj.simulation.SimHooks;

import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.Timeout;
import static org.junit.jupiter.api.Timeout.ThreadMode.SEPARATE_THREAD;

public class RobotTest {
    @BeforeEach
    void setup() {
        assert HAL.initialize(500, 0);
    }

    @AfterEach
    void shutdown() throws Exception {
        HAL.shutdown();
    }

    @Test
    @Timeout(value = 15, threadMode = SEPARATE_THREAD) // Kill the test after 15 seconds
    void doesNotHang() throws InterruptedException, TimeoutException, ExecutionException {
        // Create a simulated device that can be used to step the robot's timing.
        SimDevice device = SimDevice.create("TimeDevice");
        device.createDouble("simTime", Direction.kInput, 0.0);
        SimDeviceSim deviceSim = new SimDeviceSim("TimeDevice");
        deviceSim.registerValueChangedCallback(deviceSim.getDouble("simTime"),
                (name, handle, direction, value) -> {
                    double secsBehind = value.getDouble() - Timer.getFPGATimestamp();
                    if (secsBehind > 0) {
                        System.out.println("Calling stepTiming() should not hang");
                        SimHooks.stepTiming(secsBehind);
                        System.out.println("stepTiming() returned"); // Never printed
                    }
                }, false);

        try (TimedRobot robot = new TimedRobot() {
            @SuppressWarnings("resource")
            ADXRS450_Gyro gyro = new ADXRS450_Gyro();

            @Override
            public void robotInit() {
            }

            @Override
            public void robotPeriodic() {
                if (Timer.getFPGATimestamp() > 5.0) {
                    endCompetition();
                }
            }

            ADXRS450_GyroSim gyroSim = null;

            @Override
            public void simulationInit() {
                // Adding the following line causes SimHook.stepTiming() to block in the simTime
                // callback.
                gyroSim = new ADXRS450_GyroSim(gyro);
            }

            @Override
            public void simulationPeriodic() {
                if (gyroSim != null) {
                    // This isn't necessary to produce the problem, but it should also work.
                    // Using the current timestamp here just to ensure we always get a new number.
                    gyroSim.setAngle(Timer.getFPGATimestamp());
                }
            }

            @Override
            public void close() {
                gyro.close();
                super.close();
            }
        }) {
            // Start the robot on separate thread
            CompletableFuture<Void> competition = CompletableFuture.runAsync(() -> {
                robot.startCompetition();
            });

            // Wait a couple secs to be sure it is running and then pause it
            Thread.sleep(2000);
            SimHooks.pauseTiming();

            // Emulate an external simulator telling it to step the robot time
            // forward to 10 seconds. This doesn't need to be async to produce
            // the problem, but more accurately represents our use case.
            CompletableFuture.runAsync(() -> {
                deviceSim.getDouble("simTime").set(10.0);
            }).join();

            // Wait for the robot to finish (which should occur as soon as the robot time is
            // > 5.0 seconds).
            competition.join();
        }
    }
}

Expected behavior The test should pass (after only 2-3 seconds). Instead, it hangs in the call to SimHooks.stepTiming() (until the test times out after 15 secs).

Desktop (please complete the following information):

Additional Context The test above uses a ADXRS450_GyroSim but the problem also occurs with a new SimDeviceSim associated with a new SimDevice.

PeterJohnson commented 1 month ago

This is not surprising. I'm not sure why you are trying to map this through a SimDevice, but it's not really designed to interact with the low-level timing code like this.

What's important to realize about the simulation callbacks (including the SimDevice ones) is that they are called inline to whatever code made the change (they are not called from a separate thread).

StepTiming() does a lot of work that may be susceptible to deadlocks; in particular it will trigger notifiers as appropriate for the amount of stepped time. We've had a number of hard-to-isolate bugs in this code, so this may be a similar symptom that's simply more reproducible.

brettle commented 1 month ago

Thanks for the quick response! Regarding why we were trying to use a SimDevice like this, we are working with an external simulator via the ws_server extension and are trying to keep the external simulator's clocks and the robot's clock approximately in sync. Since the wpilib sim framework and the websockets extension already provides a way to communicate state between the robot and the simulator via a SimDevice, this seemed like a natural way to implement that functionality. Fwiw, we have worked around the issue by replacing the call to stepTiming() with something like:

                        Notifier pauser = new Notifier(() -> {
                            SimHooks.pauseTiming();
                        });
                        pauser.startSingle(secsBehind);
                        SimHooks.resumeTiming();

(We actually reuse a single Notifier instead of creating a new one during each callback.)

That doesn't hang, but it doesn't allow the robot code to be run faster than real time and running faster than real time can be useful for some automated testing.

Anyway, perhaps the reproducible test case will be useful in isolating the bugs you mentioned.