openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.22k stars 8.58k forks source link

[Question] Clarification on Pendulum Coordinate System #2999

Closed shivaniraochepuri21 closed 1 year ago

shivaniraochepuri21 commented 1 year ago

Question

The pendulum coordinate system is not clear, and the diagram mentioned in the comments of pendulum.py is not available in the master branch as far as I can see. Could you please clarify if theta is positive clockwise or anti-clockwise? When I test the Pendulum-v1 environment it seems to be positive anti-clockwise, but the image in assets is clockwise. Do correct me. Thanks.

RedTachyon commented 1 year ago

I beg you, use the "title" field the way it's intended, and put the content in the... content. Put at least a modicum of effort into asking the question if you want someone to answer it.

pseudo-rnd-thoughts commented 1 year ago

In the docstring for Pendulum it notes

    The diagram below specifies the coordinate system used for the implementation of the pendulum's
    dynamic equations.

    ![Pendulum Coordinate System](./diagrams/pendulum.png)

    -  `x-y`: cartesian coordinates of the pendulum's end in meters.
    - `theta` : angle in radians.
    - `tau`: torque in `N m`. Defined as positive _counter-clockwise_.

Therefore, Im assuming that tau will be influenced by theta and so it is "positive counter-clockwise" (ant-clockwise) so you are most likely correct

vincentzhang commented 1 year ago

theta is indeed positive counter-clockwise. It follows right-hand rule:

  1. x axis points right and y axis points up, so the axis of rotation is z-axis and it points away from screen toward the viewer.
  2. positive rotation is counter-clockwise around the axis of rotation.

See this for an illustration: https://www.evl.uic.edu/ralph/508S98/gif/rightx.gif

shivaniraochepuri21 commented 1 year ago

Thank you @vincentzhang and @pseudo-rnd-thoughts. Counterclockwise seems correct. Here, the upright position is considered as 0 degrees. Assuming no friction at the hinge, how to obtain/derive the dynamic equations considered in the pendulum.py environment? which are as follows:

$state = \left[ \theta, \dot{\theta} \right]$ $d \theta = \dot{\theta}$ $d \dot{\theta} = \ddot{\theta} = (3g/2l) * \sin(\theta) + (3u/ml^{2})$

We may have to use lagrangian but I am not able to obtain the equation considered in the gym pendulum environment. Here is what I did to derive the equations. Please correct me anywhere.

let $m$ be the mass, $l$ be the length of the pendulum let $x, y$ be the coordinates of the free end of the pendulum, theta is the angle made by the free end in counter-clockwise direction about the fixed end. Let $M$ be the moment of inertia about the fixed end of the pendulum which is

$state = \left[ \theta, \dot{\theta} \right]$

Here, $\theta$ is some angle made by the pendulum in anti-clockwise direction and let this be in the 2nd quadrant(quadrant system also in the anti-clockwise direction)

$M = (1/3)ml^2$

Then, $x = l \sin(\theta)$, $y = l \cos(\theta)$

Lagrangian $L = (1/2)mv^2 + (1/2)M \dot{\theta}^2 - PE$

As the pendulum is going toward its equilibrium position (180 degrees) from the initial position (0 degrees), work done decreases as y decreases. So PE is positive (considering the angle made by the pendulum from the equilibrium position which is $180 + \theta$, and height( $l \cos(\theta)$ ) in 3rd and 4th quadrants as negative, 1st and 2nd quadrants as positive. 1st quadrant is the top right one. Because of this height PE is positive)

$PE = mgl \cos(\theta + 180) = -mgl \cos(\theta)$

and $v^2 = \dot{x}^2 + \dot{y}^2$

Using the euler-lagrangian equation: (do correct if anything is wrong) $\frac{d}{dt} [(\frac{d}{ d \dot{\theta} } (L)] - \frac{d}{d \theta} (L) = u$ ----- eq (1)

and

$L = (2/3)m(l^2) * (\dot{\theta}^2) + mgl \cos(\theta)$

$\dot{\theta}$ is obtained as follows:

$\dot{\theta} = \frac{3u}{4ml^2} - \frac{3g}{4l} * \sin(\theta)$

Sorry about the possible inconsistency in notation. I am doubtful about the PE but either way (that is, $-mgl \cos(\theta)$ or $mgl \cos(\theta)$ ), I don't get the equation same as the one in gym. Please help me with this.

Thanks IA

RedTachyon commented 1 year ago

I went through the math in Pendulum like... a year ago? Anyways, it checked out, it's possible you'll find my derivation somewhere in old issues on this repo, though I'm not sure if I put it here in its entirety. I don't think I needed to use a Lagrangian, just regular solid body mechanics, but I don't have the capacity right now to go through your derivation. If you triple-check it and are confident that there's an issue with the implementation, please write it up with LaTeX syntax/rendering so that it's easier to parse, and I can go through it again.

One potential problem off the top of my head: L = T - V, but in your post, you add the potential energy. It might be a matter of the sign of the potential energy though.

shivaniraochepuri21 commented 1 year ago

Thanks @RedTachyon I wasn't able to find relevant information in the old issues.

I added the potential energy in the lagrangian because, originally, L = KE - PE, here PE = mglcos(theta + 180) = - mlg cos(theta), therefore, finally you add the mlgcos(theta) term in the lagrangian. Here, I am measuring the angle theta from the equilibrium postion.

I am still not sure how to consider the potential energy and its reference (and if the equilibrium position has anything to do with it). Could you please provide an insight on this? Will update the equations in LaTex anyway soon, so u can go through them and correct me if I am wrong.

Thanks IA

shivaniraochepuri21 commented 1 year ago

Thanks @RedTachyon I wasn't able to find relevant information in the old issues.

I added the potential energy in the lagrangian because, originally, L = KE - PE, here PE = m_g_l_cos(theta + 180) = - mlg cos(theta), therefore, finally you add the mlg_cos(theta) term in the lagrangian. Here, I am measuring the angle theta from the equilibrium postion.

I am still not sure how to consider the potential energy and its reference (and if the equilibrium position has anything to do with it). Could you please provide an insight on this? Will update the equations in LaTex anyway soon, so u can go through them and correct me if I am wrong.

Thanks IA

updated the derivation in previous comment. Please help me find the correct dynamics for this system. Thank you

shivaniraochepuri21 commented 1 year ago

Thanks @RedTachyon I wasn't able to find relevant information in the old issues. I added the potential energy in the lagrangian because, originally, L = KE - PE, here PE = m_g_l_cos(theta + 180) = - mlg cos(theta), therefore, finally you add the mlg_cos(theta) term in the lagrangian. Here, I am measuring the angle theta from the equilibrium postion. I am still not sure how to consider the potential energy and its reference (and if the equilibrium position has anything to do with it). Could you please provide an insight on this? Will update the equations in LaTex anyway soon, so u can go through them and correct me if I am wrong. Thanks IA

updated the derivation in previous comment. Please help me find the correct dynamics for this system. Thank you

I am implementing a model predictive controller with casadi for the pendulum system, and the evolution of state as the system tries to reach the target state is a decaying oscillation with the system I mentioned. I cannot see such a behavior with the dynamics equation implemented in gym (well, maybe that's not necessary, but it confirms the suggested dynamics are correct; how do I verify if the one provided in gym is correct? It would be great if someone can provide an old doc or method as to how we got the equation in gym). Keeping all the parameters, and changing just the equation, I can find the solution with the suggested dynamics but not with the one implemented in gym. I believe given how extensively useful gym is, the equation or some other settings/assumptions need to be looked at in Pendulum-v1 env, or please correct me if I am wrong. An RL agent may be trained for any non-linear equation of that form to do well in a test run, but MPC (and some optimal control methods, online methods) need the right model. Thanks IA

RedTachyon commented 1 year ago

image

I did the math again, and it checks out again. Uploading the derivation in case of future questions. One thing I didn't properly check is how we actually represent the angle, but it's definitely something like theta, pi - theta, etc. But if this were wrong, the problem would be very apparent (the system have an equilibrium at the top or to either side)

@jkterry1 This can probably be closed, there might be a further discussion but I'm pretty sure there's no critical bugs here.