Resolved #7. Refactored out a q-value approximator (ActionValueFunction) from the Agent, as well as separated out policies (currently inside the agent module: two python functions greedy() and epsilon_greedy()). The speed issues have been solved, and according to some comparisons the speed is the same as before the refactoring.
Resolved #7. Refactored out a q-value approximator (ActionValueFunction) from the Agent, as well as separated out policies (currently inside the agent module: two python functions greedy() and epsilon_greedy()). The speed issues have been solved, and according to some comparisons the speed is the same as before the refactoring.