Approximator class for easier swapping out of agents

This is done. However, the resulting code is ca. 25% slower than the original code (even after some optimization). Therefore, I am hesitant to merge it to master just yet.

It's a bit of a pity to not use this, though, as it quite nicely separates the agent itself from the q-function (with or without approximation) as well as from the exploration/update policy (which can be passed as python functions, which makes the whole thing quite flexible).