[rllib] Problem when specific state size of the environment + Numpy Warning

ingambe commented 3 years ago

There is two problems:

When the state has a specific size (210), we have an error, and the actor die
A warning message about non-writable array from Numpy (discussed previously on the slack channel: https://ray-distributed.slack.com/archives/CMVUQ22JD/p1602160939157200)

Ray version and other system information (Python version, TensorFlow version, OS): Ray 1.0.0 but also tried with the latest wheel, same issue. Python 3.7.7, also tried on another with python 3.8.3 Pytorch 1.8 Ubuntu 20.04.1

Reproduction

I've made a gist for this issue: https://gist.github.com/ingambe/3ae1a7bfda37da486709c81decd8355e

It contains a fake environment + a training loop with a custom network (with mask action, useless in this example but used in practice) for PPO.

If you modify the size of the state we give to the agent, there is no bug, anything apart from 30 seems to work. You need simply to modify this line here, and put whatever size you want: https://gist.github.com/ingambe/3ae1a7bfda37da486709c81decd8355e#file-state_size_problem-py-L95 But a state of size 210 doesn't work (30 * 7, it's a fattened tabular state)

It seems to be a problem with Pytorch, but the error message is unclear and hard to trace.

Also, there is in the output a small but annoying Numpy warning about non-writable array: (pid=64684) /home/local/IWAS/pierre/anaconda3/envs/JSS_Scratch/lib/python3.7/site-packages/ray/rllib/utils/torch_ops.py:65: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1595629401553/work/torch/csrc/utils/tensor_numpy.cpp:141.) The complete logs are given as a comment in the gist

[x] I have verified my script runs in a clean environment and reproduces the issue.
[x] I have verified the issue also occurs with the latest wheels.

roireshef commented 3 years ago

About the "The given NumPy array is not writeable" warning, I initially thought it's related to numpy arrays created in my environment's custom logic as states to be returned, but further investigating it shows this is not the case, but rather it's referring to the gradients that pass through AsyncGradients and ApplyGradients (head<->workers). I thought this means that the policy isn't training, but it is not the case if the policy weights aren't impacted by the fact the gradients are "not writeable", so it's a question of if they are or not...

This is the traceback I get for this warning:

2020-11-10 23:09:46,606 WARNING train_utils.py:13 -- The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:141.)
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/tune.py", line 334, in run
    runner.step()
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/trial_runner.py", line 337, in step
    self.trial_executor.start_trial(next_trial)
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/ray_trial_executor.py", line 294, in start_trial
    self._start_trial(trial, checkpoint, train=train)
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/ray_trial_executor.py", line 233, in _start_trial
    runner = self._setup_remote_runner(trial, reuse_allowed)
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/ray_trial_executor.py", line 185, in _setup_remote_runner
    return cls.remote(**kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ray/actor.py", line 379, in remote
    return self._remote(args=args, kwargs=kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ray/actor.py", line 586, in _remote
    extension_data=str(actor_method_cpu))
  File "/usr/local/lib/python3.6/dist-packages/ray/function_manager.py", line 559, in actor_method_executor
    method_returns = method(actor, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/agents/trainer_template.py", line 88, in __init__
    Trainer.__init__(self, config, env, logger_creator)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/agents/trainer.py", line 479, in __init__
    super().__init__(config, logger_creator)
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/trainable.py", line 245, in __init__
    self.setup(copy.deepcopy(self.config))
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/agents/trainer.py", line 643, in setup
    self._init(self.config, self.env_creator)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/agents/trainer_template.py", line 104, in _init
    self.train_exec_impl = execution_plan(self.workers, config)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/agents/a3c/a3c.py", line 65, in execution_plan
    grads = AsyncGradients(workers)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/execution/rollout_ops.py", line 113, in AsyncGradients
    workers.sync_weights()
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/worker_set.py", line 86, in sync_weights
    e.set_weights.remote(weights)
  File "/usr/local/lib/python3.6/dist-packages/ray/actor.py", line 101, in remote
    return self._remote(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ray/actor.py", line 121, in _remote
    return invocation(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ray/actor.py", line 115, in invocation
    num_return_vals=num_return_vals)
  File "/usr/local/lib/python3.6/dist-packages/ray/actor.py", line 732, in _actor_method_call
    list_args, num_return_vals, self._ray_actor_method_cpus)
  File "/usr/local/lib/python3.6/dist-packages/ray/function_manager.py", line 559, in actor_method_executor
    method_returns = method(actor, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 642, in set_weights
    self.policy_map[pid].set_weights(w)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/policy/torch_policy.py", line 428, in set_weights
    weights = convert_to_torch_tensor(weights, device=self.device)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/utils/torch_ops.py", line 155, in convert_to_torch_tensor
    return tree.map_structure(mapping, x)
  File "/usr/local/lib/python3.6/dist-packages/tree/__init__.py", line 516, in map_structure
    [func(*args) for args in zip(*map(flatten, structures))])
  File "/usr/local/lib/python3.6/dist-packages/tree/__init__.py", line 516, in <listcomp>
    [func(*args) for args in zip(*map(flatten, structures))])
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/utils/torch_ops.py", line 149, in mapping
    tensor = torch.from_numpy(np.asarray(item))
UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:141.)

roireshef commented 3 years ago

BTW, I'm using a slightly different setup, with:

Ubuntu 16.04.5
PyTorch 1.6.0
Ray 0.8.7
Python 3.6.8

sven1977 commented 3 years ago

Hmm, I cannot reproduce the problem_size bug:

latest master torch 1.6.0 Mac OS python 3.7.4

The following runs totally fine with the above system using problem_size=210:

import ray
import time

from ray.rllib.agents.ppo import PPOTrainer

from ray.rllib.agents import with_common_config
from ray.rllib.models import ModelCatalog
from ray.tune import register_env
import gym
import numpy as np

from ray.rllib.models.torch.torch_modelv2 import TorchModelV2
from ray.rllib.models.torch.fcnet import FullyConnectedNetwork
from ray.rllib.utils.framework import try_import_torch

torch, nn = try_import_torch()

default_config = {
    'env': 'fake',
    'seed': 0,
    'framework': 'torch',
    'log_level': 'DEBUG',
    'num_gpus': 0,
    'instance_path': '/pierre_tassel/instances/ta41',
    'num_envs_per_worker': 1,
    'rollout_fragment_length': 512,
    'num_workers': 1,
    #'sgd_minibatch_size': 100,
    'evaluation_interval': None,
    'metrics_smoothing_episodes': 100000,
    'layer_size': 512,
    'layer_nb': 2,
    'lr': 5e-5,
    'clip_param': 0.3,
    'vf_clip_param': 10.0,
    'kl_target': 0.01,
    'num_sgd_iter': 15,
    'lambda': 1.0,
    "use_critic": True,
    "use_gae": True,
    "kl_coeff": 0.2,
    "shuffle_sequences": True,
    "lr_schedule": None,
    "vf_share_layers": False,
    "vf_loss_coeff": 1.0,
    "entropy_coeff": 5e-4,
    "entropy_coeff_schedule": None,
    "grad_clip": None,
    "batch_mode": "truncate_episodes",
    "observation_filter": "NoFilter",
    "simple_optimizer": False,
    "_fake_gpus": False,
}

class FCMaskedActionsModel(TorchModelV2, nn.Module):

    def __init__(self,
                 obs_space,
                 action_space,
                 num_outputs,
                 model_config,
                 name):
        nn.Module.__init__(self)
        super(FCMaskedActionsModel, self).__init__(obs_space, action_space, action_space.n, model_config, name)
        true_obs_space = gym.spaces.MultiBinary(n=obs_space.shape[0] - action_space.n)
        self.action_model = FullyConnectedNetwork(
            obs_space=true_obs_space, action_space=action_space, num_outputs=action_space.n,
            model_config=model_config, name=name + "action_model")

    def forward(self, input_dict, state, seq_lens):
        # Extract the available actions tensor from the observation.
        action_mask = input_dict["obs"]["action_mask"]
        # Compute the predicted action embedding

        raw_actions, state = self.action_model({
            "obs": input_dict["obs"]["real_obs"]
        })
        inf_mask = torch.clamp(torch.log(action_mask), min=-1e10)
        actions = raw_actions + inf_mask

        return actions, state

    def value_function(self):
        return self.action_model.value_function()

class FAKEEnv(gym.Env):

    def __init__(self, env_config):
        self.problem_size = 210  # <-- changed this here, all seems ok
        self.action_space = gym.spaces.Discrete(self.problem_size + 1)
        self.observation_space = gym.spaces.Dict({
            "action_mask": gym.spaces.Box(0, 1, shape=(self.problem_size + 1,)),
            "real_obs": gym.spaces.Box(low=0.0, high=1.0, shape=(self.problem_size * 7,), dtype=np.float),
        })
        self.i = 0

    def reset(self):
        self.i = 0
        self.legal_actions = np.ones((self.action_space.n,), dtype=np.int)
        self.state = np.zeros((self.problem_size, 7), dtype=np.float)
        return self._get_current_state_representation()

    def _get_current_state_representation(self):
        self.state[:, 0] = self.legal_actions[:-1]
        return {
            "real_obs": self.state.flatten(),
            "action_mask": self.legal_actions,
        }

    def step(self, action: int):
        self.i += 1
        return self._get_current_state_representation(), 0, self.i > self.problem_size, {}

    def render(self, mode='human'):
        pass

def fake_env_creator(env_config):
    return FAKEEnv(env_config)

register_env("fake", fake_env_creator)

def train_func():
    ModelCatalog.register_custom_model("fc_masked_model", FCMaskedActionsModel)
    # wandb.init(config=default_config)
    config = default_config
    config['model'] = {
        "fcnet_activation": "relu",
        "custom_model": "fc_masked_model",
        'fcnet_hiddens': [config['layer_size'] for k in range(config['layer_nb'])],
    }
    config['env_config'] = {
        'instance_path': config['instance_path']
    }

    config['train_batch_size'] = config['num_workers'] * config['num_envs_per_worker'] * config[
        'rollout_fragment_length']
    config = with_common_config(config)

    config.pop('instance_path', None)
    config.pop('layer_size', None)
    config.pop('layer_nb', None)

    ray.init()

    stop = {
        "time_total_s": 60,
    }

    start_time = time.time()
    trainer = PPOTrainer(config=config)
    while start_time + stop['time_total_s'] > time.time():
        result = trainer.train()

    ray.shutdown()

if __name__ == "__main__":
    train_func()

ingambe commented 3 years ago

Thank you @sven1977 You need to keep 30 as the value of the problem size (self.problem_size = 30) Because in the state representation it is multiplied by 7: "real_obs": gym.spaces.Box(low=0.0, high=1.0, shape=(self.problem_size * 7,), dtype=np.float),

This is because the state is a tabular representation that is then flattened.

If you execute the gist I've created as it is, you should have the bug.

sven1977 commented 3 years ago

Still cannot reproduce this, even when I change self.problem_size in my code above back to a value of 30. Runs completely fine on latest master:

...
2020-12-01 10:37:50,804 DEBUG sgd.py:134 -- 13 {'allreduce_latency': 0.0, 'cur_kl_coeff': 5.587935447692872e-10, 'cur_lr': 5e-05, 'total_loss': -0.0017081310215871781, 'policy_loss': 0.0, 'vf_loss': 1.4055287945968842e-17, 'vf_explained_var': nan, 'kl': 4.577177605824545e-06, 'entropy': 3.4162619709968567, 'entropy_coeff': 0.0005}
2020-12-01 10:37:50,847 DEBUG sgd.py:134 -- 14 {'allreduce_latency': 0.0, 'cur_kl_coeff': 5.587935447692872e-10, 'cur_lr': 5e-05, 'total_loss': -0.0017081519472412765, 'policy_loss': 0.0, 'vf_loss': 1.4062486071659278e-17, 'vf_explained_var': nan, 'kl': 5.381236405810341e-06, 'entropy': 3.4163036942481995, 'entropy_coeff': 0.0005}

Process finished with exit code 0

You are saying above that "you have an error and the actor dies" (when you use 30). Could you post this exact error here?

ingambe commented 3 years ago

That's weird :thinking: Now that I've switched to Tensorflow, I don't have this issue anymore, but the error occurs again when I switch back to Pytorch.

I've made a copy of my output here: https://gist.github.com/ingambe/3ae1a7bfda37da486709c81decd8355e#gistcomment-3519586

Are you able to reproduce the "The given NumPy array is not writeable" warning?

sven1977 commented 3 years ago

Cool, yeah, that one I get as well :)

roireshef commented 3 years ago

@ingambe I get the "The given NumPy array is not writeable" warning myself and I took some time to investigate it. It seems like this is rooted at the creation of the state tensors which are to be kept immutable anyway. This has nothing to do with the tensors you care to change during training, to the best of my knowledge. Empirically, my training makes good progress even though I get those warnings, so I don't think you should be worried.

sven1977 commented 3 years ago

Yeah, let's open a separate issue on the PyTorch The given NumPy array is not writeable warnings.

sven1977 commented 3 years ago

Closing this issue now. Here is the link to the PyTorch warning issue: https://github.com/ray-project/ray/issues/12789

ray-project / ray

[rllib] Problem when specific state size of the environment + Numpy Warning #11875

Reproduction