Closed nickumia closed 1 year ago
flowchart LR
T[(Trans Prob)]
R[(Reward)]
T1[(Trans Prob)]
R1[(Reward)]
q0[("Action Value (0)")]
q1[("Action Value (1)")]
q2[("Action Value (2)")]
q3[("Action Value (3)")]
v0[("State Value (0)")]
v1[("State Value (1)")]
v2[("State Value (2)")]
p0[("Policy (0)")]
p1[("Policy (1)")]
p2[("Policy (2)")]
max[[argmax]]
max2[[argmax]]
subgraph "Iteration 1"
direction LR
subgraph Policy Evaluation
direction LR
T --> q0
R --> q0
v0 --> q0
p0 --> v1
q0 --> v1
end
subgraph Policy Improvement
T --> q1
R --> q1
v1 --> q1
q1 --> max
max --> p1
end
end
subgraph "Iteration 2"
direction LR
subgraph Policy Evaluation
T1 --> q2
R1 --> q2
v1 --> q2
p1 --> v2
q2 --> v2
end
subgraph Policy Improvement
T1 --> q3
R1 --> q3
v2 --> q3
q3 --> max2
max2 --> p2
end
end
References:
You have to explain your program in detail in the Report. Explain everything about 1) what you have done, how you implemented, etc 2) If you didn't finish, explain the things you have tried so far
Report is much more important than program code, and program code alone doesn't get credit.
HW 1 was submitted (as "incomplete"), but that's what the professor asked for... sooo... 🤷♀️
I think there's good documentaiton that can come from parts of the report, so I'll update that on the PR.
Sooo... I took the administrative decision to stop testing policy evaluation directly... policy evaluation is called from policy iteration and policy iteration is passing... soooo.... I give up on policy evaluation alone.
On the bright side, everything else seems good 😅
Related to
Notes: