[UPDATE] UNIT 1: the two main approaches...

huggingface / deep-rl-class

This repo contains the syllabus of the Hugging Face Deep Reinforcement Learning Course.

Apache License 2.0

3.92k stars 603 forks source link

[UPDATE] UNIT 1: the two main approaches... #553

Open romuvt opened 4 months ago

romuvt commented 4 months ago

When you define stochastic policies, you write:

pbm_2

\pi (a|s) = P [A|s]

LHS is a specific real number in [0,1] while on the RHS you have a probability distribution, don't you? So I think it should be something like \pi (a|s) = P [A_t = a | S_t = s]. An alternative could be to write on RHS that it is the probability of choosing action a given state s.