stanfordnmbl / osim-rl

Reinforcement learning environments with musculoskeletal models
http://osim-rl.stanford.edu/
MIT License
877 stars 248 forks source link

Target velocity vs actual velocity #205

Closed drozzy closed 4 years ago

drozzy commented 4 years ago

I have a question about "Target velocity". I don't really understand the description on the official page (the diagram is to tiny that it is not clear where the agent is vs. where it should be).

How are we to interpret the vector field if we never know at which point we are on the map? What I mean is that the state does not contain the (x,y) coordinate - so how would the agent know which target vector to pursue?

Another question - is there any way to know what our actual velocity is?

drozzy commented 4 years ago

If I print out the v_tgt_field field, I see the following. Which is a vector of 2x11. Can I assume that the first element is the target velocity and the second is our velocity? Thanks for clarification.

[[[ 1.79980698  1.77526115  1.71416152  1.62697895  1.52559726
    1.41996121  1.3168239   1.22000883  1.13123345  1.05091296
    0.97874911]
  [ 0.90696288  0.85487934  0.75909659  1.57138124  1.44748986
    1.32491347  1.21074462  1.10784729  1.01666216  0.93646022
    0.86606033]
  [ 1.43594171  1.32354109  1.06099424  0.76383202  0.59407914
    1.19754533  1.07533123  0.97026163  0.88058309  0.80401843
    0.73838201]
  [ 1.79604617  1.69694097  1.42144595  1.0108485   0.62651121
    0.45844384  0.9041464   0.80374598  0.72133168  0.65301595
    0.59576234]
  [ 1.54127773  1.51365147  1.38250767  1.07519722  0.67164563
    0.38294793  0.69257801  0.60685961  0.53904808  0.48431868
    0.43935119]
  [ 0.98651824  0.95339094  0.93124304  0.80211686  0.53784228
    0.28912765  0.19885539  0.3820219   0.33659593  0.30068469
    0.27161604]
  [ 0.44167544  0.38324281  0.3324046   0.30448317  0.2154072
    0.11615766  0.07073725  0.13685916  0.12003465  0.10688733
    0.09633201]
  [-0.37460918 -0.33066518 -0.28203221 -0.2589257  -0.18361114
   -0.09906421 -0.06005872 -0.11617204 -0.10187156 -0.0907019
   -0.08173732]
  [-0.95005135 -0.90550497 -0.88433552 -0.76725093 -0.51741639
   -0.27794972 -0.18830322 -0.36250328 -0.31923062 -0.28506722
   -0.25743961]
  [-1.49779024 -1.47534429 -1.35783012 -1.06518541 -0.66874812
   -0.37745311 -0.67336357 -0.58937298 -0.5231136  -0.46974231
   -0.42595373]
  [-1.79922937 -1.70483511 -1.43687861 -1.02733841 -0.63331783
   -0.44982272 -0.88832328 -0.78871207 -0.70720463 -0.63979865
   -0.58340692]]

 [[-0.02635947 -0.29740181 -0.54922698 -0.77002565 -0.95527639
   -1.10621434 -1.22718165 -1.32347212 -1.40011102 -1.46136305
   -1.51064562]
  [-0.01568033 -0.16906014 -0.28711257 -0.87792994 -1.06994071
   -1.21844339 -1.3319525  -1.41868756 -1.48539491 -1.537219
   -1.57795422]
  [-0.03029269 -0.31938101 -0.48966995 -0.52072859 -0.53582596
   -1.34383227 -1.44348978 -1.51611094 -1.56989599 -1.6104516
   -1.64158216]
  [-0.04858945 -0.52512393 -0.84128701 -0.88373701 -0.72465569
   -0.65972457 -1.55644444 -1.61058759 -1.64914542 -1.67737001
   -1.69854857]
  [-0.05810624 -0.65273718 -1.14024656 -1.30991313 -1.0825811
   -0.76795121 -1.6614258  -1.69461542 -1.71738964 -1.73361917
   -1.74555737]
  [-0.06132536 -0.67791715 -1.2664481  -1.6113324  -1.42944785
   -0.95604133 -0.78657975 -1.75899382 -1.76824862 -1.77470807
   -1.77938886]
  [-0.07819912 -0.77614561 -1.28752295 -1.74210526 -1.63056152
   -1.09395261 -0.79692448 -1.79478956 -1.79599323 -1.79682361
   -1.79742041]
  [-0.07819912 -0.7895552  -1.28798706 -1.74667094 -1.63870528
   -1.09999886 -0.79775566 -1.79624721 -1.79711496 -1.79771332
   -1.79814321]
  [-0.06238438 -0.68012737 -1.27038467 -1.62809123 -1.45260449
   -0.97083872 -0.78678659 -1.76311978 -1.77146601 -1.77728351
   -1.78149512]
  [-0.05835349 -0.65747592 -1.15731257 -1.34107657 -1.11392725
   -0.78222362 -1.6693057  -1.70077614 -1.72231012 -1.73762544
   -1.7488749 ]
  [-0.04983177 -0.54009819 -0.87062109 -0.91948737 -0.74992846
   -0.66269421 -1.56552922 -1.61800287 -1.65525273 -1.68245585
   -1.70283187]]]
drozzy commented 4 years ago

The body only has 6 vel entries - all coming from pelvis. I'm really confused what this 2x11 tensor represents and how it interacts with the rest of the observation.

Could you please explain to me like I'm 5? Thanks.

smsong commented 4 years ago

Sorry that the explanation on the official page is not clear for you. The top left plot shows the target velocity map in the world frame, where the black line traces the agent's position; it started at (0, 0) and is now at (5.4, 0). The bottom left plot shows the target velocity in the agent's body frame; it is +-5 m around the agent, thus, in the world frame it is +- 5 around (5.4, 0). The local velocity map will rotate as the agent faces a different direction (now it is facing the x-axis in the world frame).

I hope the diagram on the official page makes more sense now. http://osim-rl.stanford.edu/docs/nips2019/environment/

drozzy commented 4 years ago

So when you say "every grid point has a 2D target veloticy (for x and y axes in the agent's body frame)" - what do you mean by "agent's body frame"? Do you mean that when the agent is on the given square - it should have a velocity[0] in x direction and and velocity[1] in y direction?

In your code example, are the dimensions of v_body and v_tgt the same?

Also, when you write v_tgt = self.vtgt.get_vtgt(p_body).T - you use p_body. This p_body is not available to the agent, right?

If it is not available - how is the agent supposed to know what the target velocity is?

drozzy commented 4 years ago

Wait, can I assume that target is always at [v_tgt_field[0][5][5], v_tgt_field[1][5][5]]?

Thanks.

smsong commented 4 years ago

@drozzy Yes, conceptually, the target velocity at the moment is always [v_tgt_field[0][5][5], v_tgt_field[1][5][5]]. It is not exactly that due to the target velocity field in the observation dict lags by one time-step, which we plan to solve in Round 2. v_tgt_field is in the agent's body frame meaning that it changes as the agent moves, even it is static in the world frame.