Closed mpiseno closed 3 years ago
The goal velocity of HalfCheetahVelEnv
isn't provided in the observation, so changing the target velocity every epoch will make the environment highly non-markovian. This tends to make SAC perform incredibly badly, since it is forced to marginalize out the target velocity. Assuming you get velocities from HalfCheetahVelEnv.sample_task
, your mean target velocity will be zero, so the Q function probably won't fit at all.
If you want to just train SAC with a HalfCheetahVelEnv
with a randomly varying target velocity, I would recommend writing an environment wrapper that will sample a new task in reset
and change the observation to include the velocity. This isn't per-epoch, but given the typically used batch sizes for offline learning, this should be the same in expectation. (Alternatively, you can make reset
switch less frequently, but that shouldn't have an effect.)
If you're trying to write a multi-task algorithm based on SAC, you will probably need to modify SAC in a way similar to above. Note that passing garage.sampler.SetTaskUpdate
to obtain_samples
instead of constructing a new environment will probably be a little more efficient.
Of course, most people just train SAC on HalfCheetah-v2
from OpenAI Gym, which HalfCheetahVelEnv
is based on.
I am brand new to garage and my understanding of Samplers and Workers is not fully there, so any additional context of what is happening behind the scenes when answering the following question would be much appreciated.
My Question: I am training an agent with SAC on HalfCheetahVel and I'm trying to call set_task() to change the goal velocity at the beginning of every epoch/episode. However, it seems like I have to modify the given SAC implementation's train function to do so. My current workaround is to define a brand new environment if a new epoch has started and pass that as the env_update parameter in trainer.obtain_samples (see below). Is there a cleaner way to accomplish the same thing? I was looking into SetTaskSampler, but it doesn't seem to be what I want because I don't want to sample a bunch of tasks, I just want to be able to set a specific new task once per epoch.