Question about discrete gym runner observation space

ukuleleplayer / pureples

Pure Python Library for ES-HyperNEAT. Contains implementations of HyperNEAT and ES-HyperNEAT.

MIT License

114 stars 36 forks source link

Question about discrete gym runner observation space #22

Closed pablogranolabar closed 2 years ago

pablogranolabar commented 3 years ago

Hi!

Very cool project, thanks for making it available. I have a toy project I am working on with Gym for function approximation, and which is a discrete-valued observation space consisting of 12 integers; action space is also discrete-valued, three integers used to determine the correct agent action based on the sequence of 12 integers.

So does pureples support discrete observation and action spaces, and would the cartpole experiment make for a good starting point for this?

Thanks in advance!

ukuleleplayer commented 3 years ago

I would say that the uses for this library is pretty generic and should yield at least some useful output given your discrete input in the substrate. Have you tried? And what were the outcome in your three integers? I guess you're forcing the output/input to be integers by some kind of rounding?

pablogranolabar commented 3 years ago

I've got a toy Gym environment I've been working on, simple baccarat card game where both the observation and action spaces are discrete valued (int wagertype, int wageramount for actions, and observation is six integer valued cards). Agent views the previous hand of six cards then makes a bet for either player or banker and with corresponding amount. I don't want to retrofit an existing continuous valued environment for this as the agent can't make fractional bets. I'm still in the process of finishing the game play with this custom gym environment, can register it etc. Is that the best path forward? I've been experimenting with some of the other discrete valued environments with Levy's ES-HyperNEAT library but it ends up casting the discrete values to floats. So I've been building this around Pureples hopefully

Thanks for the help!

ukuleleplayer commented 3 years ago

That sounds like the right way to go, yeah - but I'm still curious about the results. Did you open this issue because you're lacking meaningful results or just as a question?

pablogranolabar commented 2 years ago

Hola @ukuleleplayer !

I am back on this project again, using Levy's neat-gym which integrates PUREPLES.

So for example in neat-gym's cartpole config, the substrate is defined as:

[Substrate]
# For (ES-)HyperNEAT
input  = [(-1. +(2.*i/3.), -1.) for i in range(4)]
hidden = [[(-0.5, 0.5), (0.5, 0.5)], [(-0.5, -0.5), (0.5, -0.5)]]
output = [(-1., 1.), (1., 1.)]
function = sigmoid

With the outputs being continuous and corresponding to the discrete action space from cartpole:

    Actions:
        Type: Discrete(2)
        Num   Action
        0     Push cart to the left
        1     Push cart to the right

So in this example it's pretty simple, he's just using the min/max to define a Boolean. But how to abstract this to multiple discrete actions, such as wager type (player / banker, discrete(2)) as well as the wager amount (discrete(1000))? Am I constrained to using binary outputs with masking or something of that nature, in order to cast the environment's discrete action (and observation) spaces to continuous?

ukuleleplayer commented 2 years ago

Hi again!

I mean, yeah, you just gotta use the outputted floats as if they were integers/Boolean - meaning flooring or ceilinging the output. The network don't care about its output types, it simply adjusts to your fitness functions, e.g. predicting 1, 0 or any other discrete value you wish for. How Levy's neat-gym project is utilizing PUREPLES I'm unaware of, but what you're trying should definitely work :)