Open ritwikmishra opened 8 months ago
Hey there 👋
So P(tau;theta) is The probability of a trajectory but we can't have it. Since it would imply to know the environment dynamics (state dist)
If you look at the formulas after what we do is:
Don't hesitate to take a piece of paper and write each part step by step to understand better. It's how I've did it.
@simoninithomas I am sorry but it is still unclear to me. My doubt is... how we jumped from this
to this
shouldn't it be as follows:
I am referring to the gradient derivation here.
The paragraph where the instructor claimed "we can approximate the likelihood ratio policy gradient with sample-based estimate" then term of P(τ;θ) (probability of trajectory τ given the parameters θ) disappeared in the subsequent summation. Why?
I asked the same question on the discord study-group (here) but got no response.