Closed ryanjulian closed 5 years ago
Given:
observation: s action: a latent: z task: t (o)--------------------->| (t)-->[embedding]--(z)-->|-->[policy]--(a)-->[env]-->(o')
Produce an environment which allows us to train an algorithm using the latent as the action space
(z)-->[wrapped env]-->(o') wrapped env: [ (o)-->| ] (z)-->[ (z)-->|-->[policy]--(a)-->[env]-->(o') ]-->(o')
where the overall diagram is
exploded: (o)-->|-------------------->| (t)-->|-->[composer]--(z)-->|-->[policy]--(a)-->[env]-->(o') contracted: (o)-->|-------------------->| (t)-->|-->[composer]--(z)-->|-->[wrapped env]-->(o')
Given:
Produce an environment which allows us to train an algorithm using the latent as the action space
where the overall diagram is