Composer environment wrapper

Given:

observation: s
action: a
latent: z
task: t
(o)--------------------->|
(t)-->[embedding]--(z)-->|-->[policy]--(a)-->[env]-->(o')

Produce an environment which allows us to train an algorithm using the latent as the action space

(z)-->[wrapped env]-->(o')

wrapped env:
      [ (o)-->|                                ]
(z)-->[ (z)-->|-->[policy]--(a)-->[env]-->(o') ]-->(o')

where the overall diagram is

exploded:
(o)-->|-------------------->|
(t)-->|-->[composer]--(z)-->|-->[policy]--(a)-->[env]-->(o')

contracted:
(o)-->|-------------------->|
(t)-->|-->[composer]--(z)-->|-->[wrapped env]-->(o')

ryanjulian / embed2learn

Composer environment wrapper #74