Open wessle opened 3 years ago
Implemented and performed preliminary tests of LinearACAgentPolynomialBasis
, a version of the actor-critic agent that uses a linear softmax policy with the polynomial feature vectors described above. Initial tests appear to show superior performance over the previous LinearACAgent
. Much more rigorous comparisons are needed.
Issue
The
SoftmaxPolicy
currently in use in theLinearACAgent
uses atypical feature vectors.Solution
We need to refactor so that
LinearACAgent
uses a classic, standard feature vector mapping inSoftmaxPolicy
. One standard approach to try is a polynomial mapping: each state-action pair(s, a)
gets mapped to the vector[s, a, s * a, 1]
(well, this vector will actually be appropriately normalized, but you get the idea).