Open Beliefuture opened 2 years ago
That is strange. As the assert suggests, this should not happen. Could you provide your training configuration? I will try to have a look over the next days.
That is strange. As the assert suggests, this should not happen. Could you provide your training configuration? I will try to have a look over the next days.
Hi, the first case happened when I leveraged my own workload to train the agent. But I think it is not relevant with the change of the workload. I think it is strange but it only exceed the given 500
budget marginally, maybe the problem of the defintion of the boundary of the storage consumption? I am not sure and I did not read the implementations of the invalid action rules carefully.
The second case happened when I adjust the lr
of the default PPO agent from 2.5e-4
to 2.5e-3
. I found the corresponding digit of the mask vector in your code is 0
, but it still recommend it. I also notice the loss turned into nan
.
That is strange. As the assert suggests, this should not happen. Could you provide your training configuration? I will try to have a look over the next days.
Additionally, it is noticeable that the invalid action chosen seems always to be 0
in the second case i.e.,:
AssertionError: Agent has chosen invalid action: 0
.
Hm, could you check if there are valid actions left?
Hm, could you check if there are valid actions left?
Hi, according to the self.valid_actions
vector, it still contains valid action i.e., 1 bit in the vector.
sum(self.valid_actions == 1)
------------
result: 6
But the position 0
in the vector is 0.
And the output in the console is:
-----------------------------------
| approxkl | nan |
| clipfrac | 0.00390625 |
| explained_variance | -2.31 |
| fps | 163 |
| n_updates | 464 |
| policy_entropy | nan |
| policy_loss | nan |
| serial_timesteps | 29696 |
| time_elapsed | 2.66e+03 |
| total_timesteps | 29696 |
| value_loss | nan |
-----------------------------------
Hm, could you check if there are valid actions left?
I compared the code in the official stable-baseline repository. It seems the function of action masking is implemented by yourself. I am not sure whether the action masking vector works normally.
actions, values, self.states, neglogpacs = self.model.step(self.obs, self.states, self.dones,
action_mask=self.action_masks)
@Bensk1 Hi, I have come across the two following questions while training the agent.