Closed whikwon closed 6 years ago
Is there any way to manage rolling means and stds for each coord of observation?
Sorry, I've found it. MeanStdFilter
handles rolling means and stds for each coord of observation.
How can i prevent inf
values in MeanStdFilter
?
@richardliaw is it possible to get an inf
during filter merges without some observation having an inf
?
@ericl I think I've found the problem. MeanStdFilter
referred from the blog https://www.johndcook.com/blog/standard_deviation/ and I think the logic calculating self._S
is not same as the formula.
What is the problem?
I think code calculating self._S
should look like this. Anyway, that might not cause inf
problem... hmm
You can probably add a print() to determine what update causes it to reach an inf value.
cc @eugenevinitsky
Are your actions and states also exploding in value?
@eugenevinitsky I'm checking now. The variance of some values appears to be very large and those might cause the problem.
I just wonder if this issue has been fixed or not? I've run into the same issue that the evaluation didn't perform as it supposed to be when the MeanStdFilter was used in the training process.
@RodgerLuo could you log the values of the MeanStdFilter and check if they seem to reasonably reflect the observation inputs? A good place to do this is in FilterManager.synchronize, or you can do it in the filter class itself.
@RodgerLuo Which environment have you used for training? Is there any abnormal feature in the env?
@whikwon @ericl Thanks for all the guidance. So here is what I've found: The environment is Pendulum-v0. In agent.compute_action, I log the values of obs before and after the filter. As you will see below, the filtered values are either extremely small or large.
[-0.87094525 0.49138007 -2.64594034]
[-8.70945248e+07 4.91380071e+07 -2.64594034e+08]
[-0.81813912 0.57502033 -1.97910756]
[-8.18139124e+07 5.75020325e+07 -1.97910756e+08]
[-0.78058041 0.62505538 -1.25146981]
[-7.80580405e+07 6.25055383e+07 -1.25146981e+08]
[-0.74561678 0.66637499 -1.08267827]
[-7.45616776e+07 6.66374987e+07 -1.08267827e+08]
[-0.71548291 0.69863024 -0.88289703]
[-71548290.6009737 69863023.92595541 -88289703.02494954]
[-0.69208157 0.7218193 -0.65892435]
[-69208156.94613262 72181929.95562999 -65892435.08048297]
[-0.67686169 0.73611021 -0.41755988]
[-67686169.49385323 73611021.31643993 -41755987.61376046]
[-0.67074812 0.74168521 -0.16547722]
[-67074812.32289593 74168521.30013305 -16547721.62643052]
[-0.67422883 0.73852251 0.09405967]
[-67422882.58069217 73852250.50403133 9405966.79778121]
[-0.68740092 0.72627817 0.35968684]
[-68740091.88581493 72627816.76141532 35968683.70469721]
So it seems like the filter is not applied correctly in the code below:
filtered_obs = self.local_evaluator.filters[policy_id](
observation, update=False)
Ah, that's the issue. Until update=True in the filter at least once, the initial observation are going to blow up passing through it
This is a trained agent though right? So presumably the filter should have a valid value even with update=False, unless the state was not restored correctly. Maybe we should add a check that we don't try to apply un-initialized filters.
Btw which algorithm is this?
Yes, it's a trained agent, and I used APEX DDPG to train.
Could you throw in a print(self.local_evaluator.filters[policy_id].rs)
? In particular I'm wondering if you're seeing n=0
(num samples), since I'm having a hard time reproducing this (I always see n > 0, e.g., (n=1490, mean_mean=-0.5082556000843493, mean_std=1.887412412375233)
after restoring with DDPG).`
Ah, I see in @whikwon 's initial post that n > 0, but mean_std is infinity. So the question is whether the inf value is there already or is some bug in restoring the checkpoint.
To help confirm the issue, it would be great to get:
If there is some script I can run (in a few min) to reproduce that would be ideal.
OK, I'll make a log and share you. It might take few days to reproduce the error.
@ericl From my end, after throwing in a print(self.local_evaluator.filters)
right before action = agent.compute_action(state)
, I've got this:
filters: {'default': MeanStdFilter((3,), True, True, None, (n=0, mean_mean=0.0, mean_std=0.0), (n=0, mean_mean=0.0, mean_std=0.0))}
To reproudce the error, what I've done is to trian with the following hyper-parameters and evaluate a checkpoint. Please let me know if you can see the same error.
pendulum-apex-ddpg:
env: Pendulum-v0
run: APEX_DDPG
checkpoint_freq: 1
stop:
training_iteration: 5
config:
use_huber: True
clip_rewards: False
num_workers: 3
n_step: 1
target_network_update_freq: 50000
tau: 1.0
observation_filter: "MeanStdFilter"
optimizer:
num_replay_buffer_shards: 3
Thanks @RodgerLuo , I was able to reproduce and fix the issue here: https://github.com/ray-project/ray/pull/2791
The problem was that in APEX the local filter was never updated, and we didn't do global filter synchronization.
This seems to be separate from the problem seen by @whikwon
System information
When I run agents have been trained on for evaluation, I found that the agent doesn't work well like the rewards I've monitored.
And I've found that when I restore agent, the filter(
MeanStdFilter
) has somewhat strange values. Have you ever heard of such a problem?Source code / logs