issues
search
mslehre
/
MAS
Maximum Acyclic Subgraph (MAS) - Multiple Sequence Alignment (MSA) Game
1
stars
0
forks
source link
__sK__ _RL: learning of v-values_
#101
Open
mauricerad
opened
5 years ago
mauricerad
commented
5 years ago
Create a training set for the ML method from sJ:
sample trajectories from the current policy.
create a training set and learn new parameters of ML (sJ) learning tuples
(s,r)
, where r is the cumulative reward (paid only at end) and s is any state on the policy episode (rollout).
goto 1.
[ ]
abc
[ ]
abc
[ ]
abc
step: 5
Create a training set for the ML method from sJ:
(s,r)
, where r is the cumulative reward (paid only at end) and s is any state on the policy episode (rollout).step: 5