stanfordnmbl / osim-rl

Reinforcement learning environments with musculoskeletal models
http://osim-rl.stanford.edu/
MIT License
877 stars 248 forks source link

Integrator step failed at time 0.23 (required condition 't1 > t0' was not met) #225

Closed A-Artemis closed 11 months ago

A-Artemis commented 3 years ago

I am trying to train my model using osim-rl but I keep running into an error shortly after starting the simulation.

The full errors I run into are: std::exception in 'SimTK::State const & OpenSim::Manager::integrate(double)': SimTK Exception thrown at AbstractIntegratorRep.cpp:428: Integrator step failed at time 0.20000000000000001 apparently because: SimTK Exception thrown at AbstractIntegratorRep.cpp:547: Error detected by Simbody method AbstractIntegrator::takeOneStep(): Unable to advance time past 0.2. (Required condition 't1 > t0' was not met.)

RuntimeError: std::exception in 'void OpenSim::Manager::initialize(SimTK::State const &)': SimTK Exception thrown at Integrator.cpp:431: Integrator initialization failed apparently because: SimTK Exception thrown at SimbodyMatterSubsystemRep.cpp:4890: Error detected by Simbody method SimbodyMatterSubsystem::projectU(): Failed to achieve required accuracy 0.001. Norm on entry was inf; norm on exit 4.6953. You might need a better starting configuration, or if there are prescribed or locked u's you might have to free some of them. (Required condition 'pverrNormAchieved <= consAccuracy' was not met.)

I have noticed that this error occurs when I swap to a different training data set. The original TD set which was from a previous NIPS challenge runs smoothly with out a hitch, it completes the training process. When I use any other training data the simulation will always produce the error above. I have even tried with the provided example motion files from Gait2392_Simbody "normal.mot", where I converted it to a .csv for the osim.py to read. I have also increased the level of precision of the values in the .csv, and I have increased the timesteps polling rate from 50hz to 100hz to 250hz, neither of which fixed the error.

I have also played around with the integrator_accuracy, using values between 1e-0 to 1e-4, all of which still give the error. I have not gone beyond 1e-4 due to a single iteration taking too long on my PC.

More info about the TD: My reward function only makes use of the following headers in the .csv time | pelvis_list | pelvis_rotation | pelvis_tilt | hip_flexion_r | hip_adduction_r | hip_rotation_r | knee_angle_r | ankle_angle_r | hip_flexion_l | hip_adduction_l | hip_rotation_l | knee_angle_l | ankle_angle_l | pelvis_ty | lumbar_bending | lumbar_rotation | lumbar_extension

kidzik commented 3 years ago

That's indeed related to integration_step and a shorter step (like 1e-5) might help, but indeed that will make simulations slower. I'm not sure if there is any workaround to that. @carmichaelong do you know if there is any robust solution to that?

carmichaelong commented 3 years ago

I don't know of any robust solution, since it depends on the specifics of what's going on during the simulation. Usually this could happen if there are some large forces (e.g., the model while training happens to step into the floor in a weird way) or a component of the model enters a region where it's numerically stiff (e.g., a muscle gets too long because a joint is bent in a weird position). @kidzik is right that a smaller integration step might help. I think others in the past have just aborted that simulation, caught the Exception, and just kept training.

It's good that you tried a few data sources, but it seems like the simulations just sometimes get caught in a weird spot regardless of which data source is used (and perhaps the TD data was lucky to work so well).

shakibaRafiee commented 3 years ago

Hi. I have a similar problem. Unfortunately, I am not sure if it would be resolved by aborting the simulation and starting over.

So, here is what's going on. I have modified the sim_L2M2019_controller1 example, and the model can produce decent gait patterns when run with a low integrator accuracy. However, for this project, after finalizing the controller, I end up perturbing the model at several instances throughout the gait cycle, and I need to have high numerical accuracy. When I increase the integrator accuracy (to 5e-14), I get this error: Integrator step failed at time [...] (required condition 't1 > t0' was not met).

I am afraid that I would not be able to just start over every time since I do not train the model using perturbations.

A-Artemis commented 3 years ago

So I believe I have found a cause for this error. (At least how I can reproduce it)

My error occurs when I call the reset() manager of the musculoskeletal model. When the model resets there is a small (<15%) chance that it will reset to a predefined position. This position is taken from training data and is uniformly randomly chosen between 0.2 and 0.3 seconds, hence why my errors kept giving: Unable to advance time past X. (Required condition 't1 > t0' was not met.) where X was between 0.2 and 0.3. The reset manager used this predefined position to set ALL the state variables for that time step, this included the muscle activations as well as the fiber lengths, which as it turns out I was missing one muscle activation for a muscle on the left leg.

I fixed this by setting all missing muscle activations to 0. Since then I have never encountered the problem again with an integrator accuracy of 1e-2 to 5-e5.

My original goal was to use this reset manager to speed up training by skipping ahead to between 0.2 and 0.3 seconds to get the training going. However, I am no longer using this method to reset my model due to my training data set having incomplete muscle activation data.

Here is what I suggest you look into (It is how I debugged my issue 😄):

So in summary, my model was trying to reset to a value it could not find! I hope this helps!

shakibaRafiee commented 3 years ago

Hi. Thank you for the reply.

I think I found the source of my problem. In my controller, I have been using several discontinuous functions, and I believe when I used higher accuracy for the integrator, it got too close to those points of discontinuity. I think to solve this problem I will have to use a different continuous function.