nickumia / cap6629

A summary of Reinforcement Learning techniques explored in Dr. Lee's class
GNU General Public License v3.0
0 stars 0 forks source link

Make HW1 more presentable + Fix Probabilistic Transition? #2

Closed nickumia closed 1 year ago

nickumia commented 1 year ago

Related to

Notes:

nickumia commented 1 year ago
flowchart LR
    T[(Trans Prob)]
    R[(Reward)]
    T1[(Trans Prob)]
    R1[(Reward)]
    q0[("Action Value (0)")]
    q1[("Action Value (1)")]
    q2[("Action Value (2)")]
    q3[("Action Value (3)")]
    v0[("State Value (0)")]
    v1[("State Value (1)")]
    v2[("State Value (2)")]
    p0[("Policy (0)")]
    p1[("Policy (1)")]
    p2[("Policy (2)")]
    max[[argmax]]
    max2[[argmax]]
    subgraph "Iteration 1"
    direction LR
    subgraph Policy Evaluation
    direction LR
    T --> q0
    R --> q0
    v0 --> q0
    p0 --> v1
    q0 --> v1
    end
    subgraph Policy Improvement
    T --> q1
    R --> q1
    v1 --> q1
    q1 --> max
    max --> p1
    end
    end

    subgraph "Iteration 2"
    direction LR
    subgraph Policy Evaluation
    T1 --> q2
    R1 --> q2
    v1 --> q2
    p1 --> v2
    q2 --> v2
    end
    subgraph Policy Improvement
    T1 --> q3
    R1 --> q3
    v2 --> q3
    q3 --> max2
    max2 --> p2
    end
    end

References:

nickumia commented 1 year ago

You have to explain your program in detail in the Report. Explain everything about 1) what you have done, how you implemented, etc 2) If you didn't finish, explain the things you have tried so far

Report is much more important than program code, and program code alone doesn't get credit.

HW 1 was submitted (as "incomplete"), but that's what the professor asked for... sooo... 🤷‍♀️

I think there's good documentaiton that can come from parts of the report, so I'll update that on the PR.

nickumia commented 1 year ago

Sooo... I took the administrative decision to stop testing policy evaluation directly... policy evaluation is called from policy iteration and policy iteration is passing... soooo.... I give up on policy evaluation alone.

On the bright side, everything else seems good 😅