stanfordnmbl / osim-rl

Reinforcement learning environments with musculoskeletal models
http://osim-rl.stanford.edu/
MIT License
877 stars 248 forks source link

possible to change the forward speed in sim_L2M2019_controller1.py? #227

Closed q138ben closed 3 years ago

q138ben commented 3 years ago

In the examples, sim_L2M2019_controller1.py gives a really good simulation of the model. I tried to change to forward speed in the init pose but the simulation failed in a few steps. I wonder if it is possible to change the forward speed. Thanks.

smsong commented 3 years ago

You can in theory but need new control parameters for your new target speed. For instance, the current simulation result (and thus the speed) is the result of running the controller with the control parameters ./osim/control/params_2D.txt (https://github.com/stanfordnmbl/osim-rl/blob/610b95cf0c4484f1acecd31187736b0113dcfb73/examples/sim_L2M2019_controller1.py#L29) You can refer to the original paper (https://physoc.onlinelibrary.wiley.com/doi/full/10.1113/JP270228) about how to find new control parameters, which involves parameter optimization. Sorry that I'm not giving the direct solution but hope this helps.

q138ben commented 3 years ago

You can in theory but need new control parameters for your new target speed. For instance, the current simulation result (and thus the speed) is the result of running the controller with the control parameters ./osim/control/params_2D.txt (

https://github.com/stanfordnmbl/osim-rl/blob/610b95cf0c4484f1acecd31187736b0113dcfb73/examples/sim_L2M2019_controller1.py#L29

) You can refer to the original paper (https://physoc.onlinelibrary.wiley.com/doi/full/10.1113/JP270228) about how to find new control parameters, which involves parameter optimization. Sorry that I'm not giving the direct solution but hope this helps.

Hi, thanks for the information. I have read the paper and found that you used a cost function J to optimize the control parameters but the cost function seems not including the 82 control parameters in the paper (7 for the reactive foot placement, 40 for stance reflexes, 31 for swing reflexes, and 4 for the modulation in the control transition). Can you please elaborate more on this optimization process? Many thanks.

q138ben commented 3 years ago

Hi again. I would like to compare the muscle force under different walking speed in the reflexed-musculoskeletal model. But I found it very difficult to change the walking speed in the model. Can you elaborate a bit more on how to get the control parameters for different walking speeds? Or if you happened to have the control parameters for other walking speeds, would you mind sharing them? Many thanks.

smsong commented 3 years ago

Hi @q138ben. Sorry for the delay. I do not have the parameter sets for different speeds for this model. You can find new parameter sets using parameter optimization (CMA-ES: https://github.com/CMA-ES/pycma). You can set the cost function as f = (v - v_tgt)^2 + integration(act^2).

q138ben commented 3 years ago

Hi, @smsong ,Thanks for the reply. I have tried the CMA-ES but I still have some doubts:

  1. How can I set the target forward velocity in order to optimize the cost function f = (v - v_tgt)^2 + integration(act^2)?

  2. I printed out the current forward velocity in state_desc['body_vel']['pelvis'][0] by running the controller in sim_L2M2019_controller1.py as following? Why isn't it constant? Should not it be always similar to the target velocity under the same control parameters? pelvis_vel_0 = 1.7509731410931813 pelvis_vel_0 = 1.7424195369839097 pelvis_vel_0 = 1.6673182753131177 pelvis_vel_0 = 1.5969578108378377 pelvis_vel_0 = 1.5223251638958755 pelvis_vel_0 = 1.4391747108420285 pelvis_vel_0 = 1.356229169471096 pelvis_vel_0 = 1.2825135982282048 pelvis_vel_0 = 1.224548516264549 pelvis_vel_0 = 1.1894554492900906 pelvis_vel_0 = 1.1642403650695012 pelvis_vel_0 = 1.163538658607405 pelvis_vel_0 = 1.1851502985553861 pelvis_vel_0 = 1.2223575020637276 pelvis_vel_0 = 1.2734911564459244 pelvis_vel_0 = 1.321781894640012 pelvis_vel_0 = 1.3525218024187569 pelvis_vel_0 = 1.3662382456478055 pelvis_vel_0 = 1.366698989574385

  3. How can I observe if the optimization goes to the right decision in CMA-ES? I ran the optimization for an hour and still did not get the solution. Also, the cost did not seem to get more stable with time.

smsong commented 3 years ago
  1. In f = (v - v_tgt)^2 + integration(act^2), v_tgt is the target velocity you want so you assign a constant number (e.g. 1.5 m/s). v is what you want to define as your walking velocity and it can be the average of peovlis_vel_0 for the last three steps in the simulation.

  2. The human model is thrown into the simulation with the initial velocity of 1.699999999999999956e+00 and then reaches steady walking over time. Even in steady walking, pelvis_vel_0 fluctuates within the gait cycle as in real humans.

  3. You may need to play around with CMA-ES to have a sense of what population size and initial sigma value would give you the desired result. With a large sigma value (>0.1), the model would fall a lot in the early generations so may need about 400 generations with a population size of 16 to converge, which can take more than a day to run on a modern desktop machine. I would recommend running a CMA-ES trial with sigma=0.01; population size (lambda)=16 for 100 generations with v_tgt=1.8 and see if the model gets to walk faster.

jegyeong-r commented 3 years ago

Hi, @smsong I was actually wondering the same thing in 3D model. I just wanted the model to walk in normal constant speed(1.5m/s), but 'params_3D_init.txt', which is provided, did not work well when I used 'sim_L2M2019_controller1.py' in mode '3d' and difficulty '0'. Do you have any distributed parameters for this case or should I also use CMA-ES to optimize the parameters? Thank you.

q138ben commented 3 years ago
  1. In f = (v - v_tgt)^2 + integration(act^2), v_tgt is the target velocity you want so you assign a constant number (e.g. 1.5 m/s). v is what you want to define as your walking velocity and it can be the average of peovlis_vel_0 for the last three steps in the simulation.
  2. The human model is thrown into the simulation with the initial velocity of 1.699999999999999956e+00 and then reaches steady walking over time. Even in steady walking, pelvis_vel_0 fluctuates within the gait cycle as in real humans.
  3. You may need to play around with CMA-ES to have a sense of what population size and initial sigma value would give you the desired result. With a large sigma value (>0.1), the model would fall a lot in the early generations so may need about 400 generations with a population size of 16 to converge, which can take more than a day to run on a modern desktop machine. I would recommend running a CMA-ES trial with sigma=0.01; population size (lambda)=16 for 100 generations with v_tgt=1.8 and see if the model gets to walk faster.

Hi @smsong. I implemented the cost function with the following code and ran a CMA-ES trial with sigma=0.01; population size (lambda)=16 for 400 generations with v_tgt=1.8. I took the average cost over the simulation time t of the episode. But I got no luck in it (the optimization problem was not solved for a 18-hours run). Please correct me if I did it in a wrong way.

    sim_dt = 0.01
    total_cost = 0
    t = 0
    while true:
        t += sim_dt
        locoCtrl.set_control_params(params)
        action = locoCtrl.update(obs_dict)
        obs_dict, reward, done, info = env.step(action, project=True, obs_as_dict=True)

        cost = (obs_dict['pelvis']['vel'][0]-  v_tgt)**2 + np.sum(np.square(action)*sim_dt,axis=0)

        total_cost += cost
        if done:
            break
        return total_cost/t  
smsong commented 3 years ago

@jegyeong-r Yes, unfortunately, I do not have any parameter set ready for the 3D model for the Learn to move environment.

If you use Matlab, maybe checking out the original 3D model may be useful to you: http://seungmoon.com/nmsModel/nmsModel.html. (FYI, this model uses the First Generation SimMechanics, so does not work in the recent releases of Matlab. I think it works in R2018a but now sure with more recent versions.)

smsong commented 3 years ago

@q138ben How does the final solutions look like? Does the human model fall down? To prevent that you should formulate the cost function in a manner that gives high penalties for undesired behaviors (e.g. fall) as shown in equation 1 of this paper: https://physoc.onlinelibrary.wiley.com/doi/full/10.1113/JP270228.

I think the easiest way to set up such a cost is by running the Learn to Move environment with difficulty=0; model='2D' then set the cost as something like cost=-total_reward. You would also need to set ver['ver00']['n_new_target'] = 1.8 in https://github.com/stanfordnmbl/osim-rl/blob/610b95cf0c4484f1acecd31187736b0113dcfb73/envs/target/v_tgt_field.py#L24 I would recommend to first checkout get_reward_1() first to see if it make sense to you before running CMA-ES: https://github.com/stanfordnmbl/osim-rl/blob/610b95cf0c4484f1acecd31187736b0113dcfb73/osim/env/osim.py#L766-L823

jegyeong-r commented 3 years ago

@smsong Thanks for your kind explanation! I'll check the Matlab model!!

-October 19th I've been working on the Matlab model, but I find out some parameters are missing and some parameters are added. I wonder if there are any reasons for this parameter change.

Also, at first, I was thinking of just implementing the parameters would make the osim-rl model work in 3d walking. However, it turned out I have to do more work to find the missing parameters. In this case, do you recommend Matlab model more than the model in osim-rl?

q138ben commented 3 years ago

Hi @smsong . Thanks for the suggestion. Now the optimization process went smoother but still came with several questions.

  1. I saved the control parameters with the highest reward as I ran the optimization, say now I have got the best total_reward= 200 until the model fell down. Then I used the saved control parameters as the initial parameters to start another optimation. Yet the optimization fluctuated a lot with many simulations ending with total_reward around 50 to 100 and the model fell down quite soon in 3s. So does the cma-es algorithm update the parameters stochastically that is not based on the previous best params?

  2. I tested the CMA-ES with the rosenbrock function which was solved in a few seconds. How does the cma-es consider solving the optimization problem in finding the control parameters under the target velocity in my case?

  3. Running the optimization for a day can now get me a total reward of 229 and a simulation time of 11s before the model fell down. But then the improvement is tiny now. I would assume that I might also need to adjust the init-pose. If so, do you have any suggestions for me to change the init_pose? if not, can you point out how I should investigate further?

A: Now I can see the progress by solving my 4th question. But I was still wondering if I should look into changing the init_pose.

  1. UnboundLocalError: local variable 'reward_footstep_0' referenced before assignment The aforementioned error always happened after hundreds of simulations so that I had to manually start a new optimization process again each time.

A: The problem is solved when I dug the code a bit more.

  1. What does ver['ver00']['n_new_target'] exactly mean? My assumption is that the control parameters under different walking velocities would be also different. I just realized that I can successfully run the model with the default params_2D even I set ver['ver00']['n_new_target'] = 100. This result completely made no sense to me.
smsong commented 3 years ago
  1. This depends on how you set up your cma-es. If you set it up correctly, it should be using the parameters you set at init. Regardless of that, I would recommend you to initiate with the "next mean" of the previous cma-es trial instead of the one that had the best reward (let's call it "bestever", because "bestever" may be a local minima.

  2. I do not understand your question. This article might help you to understand how cma-es works (if that is your question): https://en.wikipedia.org/wiki/CMA-ES

  3. If you are using the same init_pose as the one that is used for successful walking and the cost is set up as I suggested, then it should not be an init_pose problem, because it clearly can walk without falling with the given init_pose and walking without falling should give lower costs than falling after 11s. Make sure you set up the cost as I suggested. It should be something like the cost presented in this paper https://physoc.onlinelibrary.wiley.com/doi/full/10.1113/JP270228 where walking without falling has clearly higher costs than falling (no matter what the average walking speeds were).

4~5. I recommend you to first make sure you understand the cost+simulation so that you can track an issue when it occurs. The reflex-based controller does not take target velocity as input so will work exactly the same for different target velocities. It will just give a lower reward for ver['ver00']['n_new_target'] = 100 because the walking velocity is far from the target velocity.

q138ben commented 3 years ago

Hi, @smsong . I have read through the paper https://physoc.onlinelibrary.wiley.com/doi/full/10.1113/JP270228 and I understand the cost function. It mentioned in the paper that "With different target speeds in eqn (1)c, the control network further generates walking at speeds ranging from 0.8 m s−1 to 1.8 m s−1".

If changing the target velocity does not make the model walk differently, how can I make model walk fast?

smsong commented 3 years ago

@q138ben The reflex-based controller does not include the capability of switching its behavior; so you need a "higher-layer controller" that switches the control parameters based on the behaviors you want. For example, the speed transitions from 0.8 to 1.8 m/s in the paper is achieved with three set of control parameters: first by one for walking at 0.8 m/s, another for transitioning from 0.8->1.8 m/s, and the last for walking at 1.8 m/s. Another way to achieve this sort of behavior control is by training a DNN that regulates the control parameters of the reflex-based controller, which I have not tried but am really interested to pursue. I hope this clarifies the reflex-based controller.

q138ben commented 3 years ago

@smsong . Thanks for the reply. I understand that I will need a supraspinal layer to modulate the speed transition. But there should be somewhere coded in the reflex model or the control parameters that can alter the walking speed, right? For example, muscle activation? isometric force? or any other?

For a 2d model, there are 37 control parameters which are not explained explicitly. Can you elaborate a bit? # control parameters cp_keys = [ 'theta_tgt', 'c0', 'cv', 'alpha_delta', 'knee_sw_tgt', 'knee_tgt', 'knee_off_st', 'ankle_tgt', 'HFL_3_PG', 'HFL_3_DG', 'HFL_6_PG', 'HFL_6_DG', 'HFL_10_PG', 'GLU_3_PG', 'GLU_3_DG', 'GLU_6_PG', 'GLU_6_DG', 'GLU_10_PG', 'HAM_3_GLU', 'HAM_9_PG', 'RF_1_FG', 'RF_8_DG_knee', 'VAS_1_FG', 'VAS_2_PG', 'VAS_10_PG', 'BFSH_2_PG', 'BFSH_7_DG_alpha', 'BFSH_7_PG', 'BFSH_8_DG', 'BFSH_8_PG', 'BFSH_9_G_HAM', 'BFSH_9_HAM0', 'BFSH_10_PG', 'GAS_2_FG', 'SOL_1_FG', 'TA_5_PG', 'TA_5_G_SOL', 'theta_tgt_f', 'c0_f', 'cv_f', 'HAB_3_PG', 'HAB_3_DG', 'HAB_6_PG', 'HAD_3_PG', 'HAD_3_DG', 'HAD_6_PG' ]

smsong commented 3 years ago

@q138ben It is not simple to explain all the parameters but most of those would correspond to the parameters explained in the original paper. For a quick note, PG: P gain; DG: D gain; FG: (positive) force feedback gain; and HFL, GLU, HAM, ... indicates the muscles.

Regarding which parameters you should change to change speed... I would change all the parameters. It would be possible to change fewer parameters to change speed, but the resulting gait may not be human-like. Identifying the minimum number of parameters to change speed while maintaining human-like gait would be an interesting study, which I have done before with a different control model: https://ieeexplore.ieee.org/abstract/document/6225307.

q138ben commented 3 years ago

HI, @smsong. Thanks again for the recommended paper. It is really interesting to see how you managed to perform speed transitions. Then I realized that I might not make myself clear in the previous question. I am not looking into the speed transition, e.g from 0.8 m/s to 1.0 m/s during walking. Instead, I just want to find another set of control parameters through CMA-ES optimization to make the model walk at another target speed in steady walking.

  1. In your paper, it stated that 'The optimization repeatedly finds steady walking for all six target speeds from 0.8m/s to 1.8m/s'. So can I say that the optimization finds different sets of control parameters for different target speeds?

2, Using the cost function you suggested, the CMA-ES should minimize the cost function to find a set of control parameters that force the model walks close to the target velocity and minimize the muscle activation etc. But why did I always get the same walking speed and same muscle activation no matter what target velocity I set when the optimization finished?

smsong commented 3 years ago

@q138ben Please find my answers below:

  1. Yes, I would recommend you to run an optimization trial to find a set of control parameters for a target speed.
  2. I don't think I understand your situation. I assume you see the cost changing during optimization, which means that the parameters and the corresponding gait (i.e., activation, speed, etc.) are changing. Once done with optimization, you should run a simulation with the optimized parameters to see the optimized result. and target velocity does not change the simulation as it is not an input to the controller but merely is used to evaluate the cost of the resulting gait.
q138ben commented 3 years ago

@smsong Let me try to elaborate on my situation. By running the optimized parameters, let's say I got an average speed of 1.4m/s in a 100s simulation and it's always 1.4m/s whenever the model can perform steady walking after optimization. Why did I always get 1.4m/s? If I would like to have an average walking velocity of 1.8 m/s, how can I do?

smsong commented 3 years ago

@q138ben You need a new set of parameters that is optimized for 1.8 m/s. I thought that was what you were asking in Q1.

q138ben commented 3 years ago

@smsong . Yes, exactly. But I have already got the new set of parameters by running the optimisation. My initial values of the control parameters are all ones or random values and I got my updated parameters after the optimisation. But why is my optimized result almost the same with the result of running the controller with the control parameters ./osim/control/params_2D.txt? Also, what do you mean that I need a new set of parameters that is optimized for 1.8 m/s? From my point of view, the control parameters is a result of the cma-es optimazation and I have no influence on it.

smsong commented 3 years ago

@q138ben I see. It's not the same but almost the same. Do you think that your CMA-ES trial converged? If not, you can run an optimization for more generations (e.g., 800 gen instead of 400). Also, I would include INIT_POSE as part of the parameters you optimize. For doing that, you would want to carefully constrain the INIT_POSE values so that the human model does not start at a weird pose (e.g., a foot penetrating the ground, etc.). Let me know how it goes!

q138ben commented 3 years ago

@smsong I included the INIT_POSE as the new control parameter with the same values in the default controller except for changing the forward speed to 1.2 m/s. Running the CMA-ES optimization again, I ultimately got the model into steady walking. But joint kinematics and muscle activation are still almost the same as the default controller.

The solution shows the model developed a strategy to stable the body first under the 1.2m/s forward speed, then start walking from almost a stand still position. Note that I have tried several times but every time it went intro such strategy.

I wonder if such gait kinematics are somehow encoded in the reflex-based model so that it will eventually walk in the same way when it go into steady walking even under different initial positions.

smsong commented 3 years ago

@q138ben The kinematics are not directly encoded into the controller, but it is not surprising that the optimized gait does not look much different given that human-like gait is dynamically stable/attractive(?) (e.g., exploits the passive dynamics, etc.).

Just to make sure my message got through: I suggested you to optimize both the control parameters and 'INIT_POSE` simultaneously. In that, when optimizing for faster walking, for example, CMA-ES would likely set the initial forward speed to be faster and help the model to steadily walk at a faster speed.

q138ben commented 3 years ago

@smsong I agree with you that it is amazing to see that the model always finds a similar approach to perform a human-like gait. But if you look deep into the kinematics, you will find that the ankle plantarflexion is quite small and hip and knee flexion are larger than experimental data, if being critic. And that is why I tried to change the control parameters to make the gait kinematics more human-like.

Just to make myself clear, I have already optimized optimized both the control parameters and 'INIT_POSE` simultaneously. And I have tried an initial forward speed of 1.2 or 1.8 m/s which did not vary much after the optimization. But I am quite frustrated to see that the gait kinematics remained the same at both speed in a optimized steady walking.

smsong commented 3 years ago

@q138ben I have not played much with CMA-ES and the reflex-based controller in the OpenSim-RL environment. But I think it will be able to walk at different speeds based on my experience of doing so in Matlab, and as @carmichaelong did so in a similar environment and a controller: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006993.

FYI, in Matlab, I was able to make the model walk in various weird gaits by tweaking the cost function to reward gaits with bent knees, with minimum use of ankle muscles, with maximum asymmetry between the legs, etc. So the control is capable to produce a wide range of gaits. If you want it to better match your experimental data, you can try to optimize for it (e.g., penalize the deviation from your target kinematics).

q138ben commented 3 years ago

@smsong I have now managed to get the model walking in slower and faster speeds by putting more weight on the speed. Thanks. But the computational cost was tremendous that the optimization took days to solve. Then I realized that the optimization used here actually belonged to shooting methods. There is also nonlinear optimization technique called direct collocation which should significantly reduce the computational time. What do you think about this method? Is it possible to put the reflex-based model into direct collocation framework?

smsong commented 3 years ago

@q138ben Glad to hear that you got the model to walk at different speeds! Yes, shooting+CMA is not the most time-efficient optimization approach and direct collocation could tremendously reduce the speed. Direct collocation usually optimizes for one foot step and would need to be creative to encode "robustness" in the solution, where the single-shooting+CMA optimized for multiple steps naturally has. Also, to use direct collocation, you probably would need to apply some computational tricks to make the reflex-based controller to be differentiable, etc. We have been exploring using direct collocation with the reflex-based controller, though it is not our current focus. I would be happy to discuss more through email if you are interested (and you can close this thread if your original issue is solved).

q138ben commented 3 years ago

Thanks. I will get in touch via email if I have further questions.