mslehre / MAS

Maximum Acyclic Subgraph (MAS) - Multiple Sequence Alignment (MSA) Game
1 stars 0 forks source link

__sK__ _RL: learning of v-values_ #101

Open mauricerad opened 5 years ago

mauricerad commented 5 years ago

Create a training set for the ML method from sJ:

  1. sample trajectories from the current policy.
  2. create a training set and learn new parameters of ML (sJ) learning tuples (s,r), where r is the cumulative reward (paid only at end) and s is any state on the policy episode (rollout).
  3. goto 1.

step: 5