__sK__ _RL: learning of v-values_

mslehre / MAS

Maximum Acyclic Subgraph (MAS) - Multiple Sequence Alignment (MSA) Game

1 stars 0 forks source link

Open mauricerad opened 5 years ago

mauricerad commented 5 years ago

Create a training set for the ML method from sJ:

sample trajectories from the current policy.
create a training set and learn new parameters of ML (sJ) learning tuples (s,r), where r is the cumulative reward (paid only at end) and s is any state on the policy episode (rollout).
goto 1.

step: 5