DRL solvers using hash as state value

rlipkis commented 3 years ago

Hi! I've been using your package, and I've run into an issue with the DRL solvers.

The function get_action passes the state into the neural network policy.μ, but this state is computed in convert_s to be the hash of the ASTState. My impression is that a hash value is not really a meaningful input to a NN, and it seems like it would invalidate much of the learning, effectively reducing the DRL algorithms to a random search.

If the GrayBox interface is extended to allow a Vector{Float64} state to be specified and stored in the ASTState at each update, this can be extracted in convert_s and passed into the NN. I've made those changes in my local copy to get things working, but perhaps there's a solution that's more in line with your vision for the package, in terms of genericity, etc. If you'd like, I can submit a pull request.

mossr commented 3 years ago

You're absolutely correct. This has been an extension to the GrayBox interface I've been planning on for a while. I'll take a look at your approach and let you know if that's the generic design that fits within this package. Nevertheless, we'll converge to allow some explicit state in this framework.

Glad to see you're using this package, and I remember you from CS238 👋

mossr commented 3 years ago

I recently pushed changes based on your fork to include explicit GrayBox.state information (instead of the sim.hash).

sisl / POMDPStressTesting.jl

DRL solvers using hash as state value #4