Make HW1 more presentable + Fix Probabilistic Transition?

nickumia commented 1 year ago

Related to

https://github.com/nickumia/cap6629/pull/1

Notes:

[x] Parameterize start position
[x] Parameterize algorithm choice
[x] Parameterize transition probability type
Probabilistic Transition doesn't seem to be the best implementation... Is there something wrong with it? Or does it actually work as intended... Investigation is necessary.

nickumia commented 1 year ago

flowchart LR
    T[(Trans Prob)]
    R[(Reward)]
    T1[(Trans Prob)]
    R1[(Reward)]
    q0[("Action Value (0)")]
    q1[("Action Value (1)")]
    q2[("Action Value (2)")]
    q3[("Action Value (3)")]
    v0[("State Value (0)")]
    v1[("State Value (1)")]
    v2[("State Value (2)")]
    p0[("Policy (0)")]
    p1[("Policy (1)")]
    p2[("Policy (2)")]
    max[[argmax]]
    max2[[argmax]]
    subgraph "Iteration 1"
    direction LR
    subgraph Policy Evaluation
    direction LR
    T --> q0
    R --> q0
    v0 --> q0
    p0 --> v1
    q0 --> v1
    end
    subgraph Policy Improvement
    T --> q1
    R --> q1
    v1 --> q1
    q1 --> max
    max --> p1
    end
    end

    subgraph "Iteration 2"
    direction LR
    subgraph Policy Evaluation
    T1 --> q2
    R1 --> q2
    v1 --> q2
    p1 --> v2
    q2 --> v2
    end
    subgraph Policy Improvement
    T1 --> q3
    R1 --> q3
    v2 --> q3
    q3 --> max2
    max2 --> p2
    end
    end

References:

nickumia commented 1 year ago

You have to explain your program in detail in the Report. Explain everything about 1) what you have done, how you implemented, etc 2) If you didn't finish, explain the things you have tried so far

Report is much more important than program code, and program code alone doesn't get credit.

HW 1 was submitted (as "incomplete"), but that's what the professor asked for... sooo... 🤷‍♀️

I think there's good documentaiton that can come from parts of the report, so I'll update that on the PR.

nickumia commented 1 year ago

Sooo... I took the administrative decision to stop testing policy evaluation directly... policy evaluation is called from policy iteration and policy iteration is passing... soooo.... I give up on policy evaluation alone.

On the bright side, everything else seems good 😅

nickumia / cap6629

Make HW1 more presentable + Fix Probabilistic Transition? #2