Closed mrdrozdov closed 7 years ago
The supervised model gets 83/93 on class/transition accuracy after 65k steps (so roughly the same as before). I think this should be okay to merge, unless we want to run some RL specific sanity check.
This is probably too outdated. Closing.
Notable differences:
(t_probs - t_preds).abs()
.(1 - t_probs).round()
. This is because probs represent the probability of shifting. This should probably be adjusted to simply round, aka should be probability of reducing.