Closed dmadeka closed 4 years ago
Config File:
config = with_common_config({
# If true, use the Generalized Advantage Estimator (GAE)
# with a value function, see https://arxiv.org/pdf/1506.02438.pdf.
# GAE(lambda) parameter
# Initial coefficient for KL divergence
# Size of batches collected from each worker
"sample_batch_size": 200,
"num_gpus": 16,
"num_workers": 50,
# Number of timesteps collected for each SGD round
"train_batch_size": 4000,
# Total SGD batch size across all devices for SGD
# Number of SGD iterations in each outer loop
# Stepsize of SGD
"lr": 5e-5,
# Learning rate schedule
# Share layers for value function
# Coefficient of the value function loss
# Coefficient of the entropy regularizer
# Clip param for the value function. Note that this is sensitive to the
# scale of the rewards. If your expected V is large, increase this.
"batch_mode": "truncate_episodes",
# Which observation filter to apply to the observation
"observation_filter": "MeanStdFilter",
# Uses the sync samples optimizer instead of the multi-gpu one. This does
# not support minibatches.
#"simple_optimizer": True,
# (Deprecated) Use the sampling behavior as of 0.6, which launches extra
# sampling tasks for performance but can waste a large portion of samples.
# Use PyTorch as backend - no LSTM support
# GAE(gamma) parameter
# Max global norm for each gradient calculated by worker
# Learning rate
# Learning rate schedule
# Value Function Loss coefficient
# Entropy coefficient
# Min time per iteration
# Workers sample async. Note that this increases the effective
# sample_batch_size by up to 5x due to async buffering of batches.
"sample_async": True,
})
Function call:
agent = a3c.A3CAgent(env='MyEnv', config=config)
I think it will work if you use "observation_filter": "ConcurrentMeanStdFilter"?
Probably we should choose that automatically when sample_async is True.
Got it, there's a separate ConcurrentMeanStdFilter
and MeanStdFilter
. Got it! Thanks so much!
Not sure why the split btw? Wouldnt you call the appropriate one depending on the algorithm/sample_async?
Looks like this is resolved.
When I try to run A3C with continuous actions and a
MeanStdFilter
observation filter. I get the following error:Which is surprising because Im not using the
ConcurrentMeanStdFilter
. Does A3C not support theMeanStdFilter
?