This is done. However, the resulting code is ca. 25% slower than the original code (even after some optimization). Therefore, I am hesitant to merge it to master just yet.
It's a bit of a pity to not use this, though, as it quite nicely separates the agent itself from the q-function (with or without approximation) as well as from the exploration/update policy (which can be passed as python functions, which makes the whole thing quite flexible).
This is done. However, the resulting code is ca. 25% slower than the original code (even after some optimization). Therefore, I am hesitant to merge it to master just yet.
It's a bit of a pity to not use this, though, as it quite nicely separates the agent itself from the q-function (with or without approximation) as well as from the exploration/update policy (which can be passed as python functions, which makes the whole thing quite flexible).
The refactored code is in the ref_approximator branch