Bugs encountered while training the agent

Beliefuture commented 2 years ago

@Bensk1 Hi, I have come across the two following questions while training the agent.

the storage consumption of the corresponding index configuration set was exceeded marginally. Therefore, the training process was terminated. I am confused that why this would happen because if the index chosen would violate the constraints, it is considered to be a invalid action as illustrated in your paper.

  File "../swirl/stable_baselines/ppo2/ppo2.py", line 520, in _run
    if self.callback.on_step() is False:
  File "../swirl/stable_baselines/common/callbacks.py", line 94, in on_step
    return self._on_step()
  File "../swirl/stable_baselines/common/callbacks.py", line 170, in _on_step
    continue_training = callback.on_step() and continue_training
  File "***/swirl/stable_baselines/common/callbacks.py", line 94, in on_step
    return self._on_step()
  File "***/swirl/stable_baselines/common/callbacks.py", line 539, in _on_step
    return_episode_rewards=True)
  File "../swirl/stable_baselines/common/evaluation.py", line 41, in evaluate_policy
    obs, reward, done, _info = env.step(action)
  File "../swirl/stable_baselines/common/vec_env/base_vec_env.py", line 150, in step
    return self.step_wait()
  File "../swirl/stable_baselines/common/vec_env/vec_normalize.py", line 91, in step_wait
    obs, rews, news, infos = self.venv.step_wait()
  File "../swirl/stable_baselines/common/vec_env/dummy_vec_env.py", line 44, in step_wait
    self.envs[env_idx].step(self.actions[env_idx])
  File "***/lib/python3.7/site-packages/gym/wrappers/order_enforcing.py", line 13, in step
    observation, reward, done, info = self.env.step(action)
  File "***/swirl/gym_db/envs/db_env_v1.py", line 99, in step
    init=False, new_index=new_index, old_index_size=old_index_size
  File "***/gym_db/envs/db_env_v1.py", line 204, in _update_return_env_state
    "Storage consumption exceeds budget: "
AssertionError: Storage consumption exceeds budget: 500.08883199999997  > 500

the action was invalid but still chosen. Therefore, the training process was terminated. To be specifically action[0] was chosen when it was invalid. But the mask vector was checked to be 0.

--------------------------------------
| approxkl           | nan           |
| clipfrac           | 0.1796875     |
| explained_variance | -2.03         |
| fps                | 0             |
| n_updates          | 250           |
| policy_entropy     | 0.14278165    |
| policy_loss        | nan           |
| serial_timesteps   | 16000         |
| time_elapsed       | 1.51e+03      |
| total_timesteps    | 16000         |
| value_loss         | 0.00048095174 |
--------------------------------------

Traceback (most recent call last):
  File "main.py", line 141, in <module>
    tb_log_name=experiment.id)  # the name of the run for tensorboard log
  File "../swirl/stable_baselines/ppo2/ppo2.py", line 342, in learn
    rollout = self.runner.run(callback)
  File "../swirl/stable_baselines/common/runners.py", line 59, in run
    return self._run()
  File "../swirl/stable_baselines/ppo2/ppo2.py", line 497, in _run
    self.obs[:], rewards, self.dones, infos = self.env.step(clipped_actions)
  File "../swirl/stable_baselines/common/vec_env/base_vec_env.py", line 150, in step
    return self.step_wait()
  File "../swirl/stable_baselines/common/vec_env/vec_normalize.py", line 91, in step_wait
    obs, rews, news, infos = self.venv.step_wait()
  File "../swirl/stable_baselines/common/vec_env/dummy_vec_env.py", line 44, in step_wait
    self.envs[env_idx].step(self.actions[env_idx])
  File "***/lib/python3.7/site-packages/gym/wrappers/order_enforcing.py", line 13, in step
    observation, reward, done, info = self.env.step(action)
  File "***/swirl/gym_db/envs/db_env_v1.py", line 79, in step
    self._step_asserts(action)
  File "***/swirl/gym_db/envs/db_env_v1.py", line 67, in _step_asserts
    ), f"Agent has chosen invalid action: {action}"
AssertionError: Agent has chosen invalid action: 0

Bensk1 commented 2 years ago

That is strange. As the assert suggests, this should not happen. Could you provide your training configuration? I will try to have a look over the next days.

Beliefuture commented 2 years ago

That is strange. As the assert suggests, this should not happen. Could you provide your training configuration? I will try to have a look over the next days.

Hi, the first case happened when I leveraged my own workload to train the agent. But I think it is not relevant with the change of the workload. I think it is strange but it only exceed the given 500 budget marginally, maybe the problem of the defintion of the boundary of the storage consumption? I am not sure and I did not read the implementations of the invalid action rules carefully.

The second case happened when I adjust the lr of the default PPO agent from 2.5e-4 to 2.5e-3. I found the corresponding digit of the mask vector in your code is 0, but it still recommend it. I also notice the loss turned into nan.

Beliefuture commented 2 years ago

That is strange. As the assert suggests, this should not happen. Could you provide your training configuration? I will try to have a look over the next days.

Additionally, it is noticeable that the invalid action chosen seems always to be 0 in the second case i.e.,: AssertionError: Agent has chosen invalid action: 0 .

Bensk1 commented 2 years ago

Hm, could you check if there are valid actions left?

Beliefuture commented 2 years ago

Hm, could you check if there are valid actions left?

Hi, according to the self.valid_actions vector, it still contains valid action i.e., 1 bit in the vector.

sum(self.valid_actions == 1)
------------
result: 6

But the position 0 in the vector is 0.

And the output in the console is:

-----------------------------------
| approxkl           | nan        |
| clipfrac           | 0.00390625 |
| explained_variance | -2.31      |
| fps                | 163        |
| n_updates          | 464        |
| policy_entropy     | nan        |
| policy_loss        | nan        |
| serial_timesteps   | 29696      |
| time_elapsed       | 2.66e+03   |
| total_timesteps    | 29696      |
| value_loss         | nan        |
-----------------------------------

Beliefuture commented 2 years ago

Hm, could you check if there are valid actions left?

I compared the code in the official stable-baseline repository. It seems the function of action masking is implemented by yourself. I am not sure whether the action masking vector works normally.

actions, values, self.states, neglogpacs = self.model.step(self.obs, self.states, self.dones,
                                                                  action_mask=self.action_masks)

hyrise / rl_index_selection

Bugs encountered while training the agent #1