A proper linear softmax policy

wessle / costaware

Repository for cost-aware project code.

MIT License

2 stars 0 forks source link

A proper linear softmax policy #44

Open wessle opened 3 years ago

wessle commented 3 years ago

Issue

The SoftmaxPolicy currently in use in the LinearACAgent uses atypical feature vectors.

Solution

We need to refactor so that LinearACAgent uses a classic, standard feature vector mapping in SoftmaxPolicy. One standard approach to try is a polynomial mapping: each state-action pair (s, a) gets mapped to the vector [s, a, s * a, 1] (well, this vector will actually be appropriately normalized, but you get the idea).

wessle commented 3 years ago

Update

Implemented and performed preliminary tests of LinearACAgentPolynomialBasis, a version of the actor-critic agent that uses a linear softmax policy with the polynomial feature vectors described above. Initial tests appear to show superior performance over the previous LinearACAgent. Much more rigorous comparisons are needed.