ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32.95k stars 5.58k forks source link

[RLlib] "RuntimeError: The learner thread died while training!" after a view training cycles. #36801

Open ULudo opened 1 year ago

ULudo commented 1 year ago

What happened + What you expected to happen

Problem When using the IMPALA algorithm in RLlib the training process crashes after a while with the error message „RuntimeError: The learner thread died while training!“. This already happens when using a single worker. I get this error when using a custom environment, but this also occurs when using benchmark environments.

Full error message

python.exe : C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\rllib\algorithms\algorithm.py:442: 
RayDeprecationWarning: This API is deprecated and may be removed in future Ray releases. You could suppress this 
warning by setting env variable PYTHONWARNINGS="ignore::DeprecationWarning"
At line:1 char:1
+ python.exe .\src\impala_mountain_car.py *>&1 > .\train_mc_res.txt
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (C:\Users\ULudo\mi...ecationWarning":String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

`UnifiedLogger` will be removed in Ray 2.7.
  return UnifiedLogger(config, logdir, loggers=None)
C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\tune\logger\unified.py:53: RayDeprecationWarning: This API 
is deprecated and may be removed in future Ray releases. You could suppress this warning by setting env variable 
PYTHONWARNINGS="ignore::DeprecationWarning"
The `JsonLogger interface is deprecated in favor of the `ray.tune.json.JsonLoggerCallback` interface and will be 
removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\tune\logger\unified.py:53: RayDeprecationWarning: This API 
is deprecated and may be removed in future Ray releases. You could suppress this warning by setting env variable 
PYTHONWARNINGS="ignore::DeprecationWarning"
The `CSVLogger interface is deprecated in favor of the `ray.tune.csv.CSVLoggerCallback` interface and will be removed 
in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\tune\logger\unified.py:53: RayDeprecationWarning: This API 
is deprecated and may be removed in future Ray releases. You could suppress this warning by setting env variable 
PYTHONWARNINGS="ignore::DeprecationWarning"
The `TBXLogger interface is deprecated in favor of the `ray.tune.tensorboardx.TBXLoggerCallback` interface and will be 
removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
2023-06-24 18:21:51,670 INFO worker.py:1636 -- Started a local Ray instance.
2023-06-24 18:22:01,712 INFO trainable.py:173 -- Trainable.setup took 12.674 seconds. If your trainable is slow to 
initialize, consider setting reuse_actors=True to reduce actor creation overheads.
Exception in thread Thread-7:
Traceback (most recent call last):
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 1235, in 
_worker
    self.loss(model, self.dist_class, sample_batch)
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\rllib\algorithms\impala\impala_torch_policy.py", 
line 235, in loss
    action_dist = dist_class(model_out, model)
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\rllib\models\torch\torch_action_dist.py", line 250, 
in __init__
    self.dist = torch.distributions.normal.Normal(mean, torch.exp(log_std))
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\torch\distributions\normal.py", line 56, in __init__
    super(Normal, self).__init__(batch_shape, validate_args=validate_args)
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\torch\distributions\distribution.py", line 56, in 
__init__
    raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (256, 1)) of distribution Normal(loc: torch.Size([256, 1]), scale: 
torch.Size([256, 1])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan]], device='cuda:0', grad_fn=<SplitBackward0>)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\threading.py", line 932, in _bootstrap_inner
    self.run()
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\rllib\execution\learner_thread.py", line 74, in run
    self.step()
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\rllib\execution\multi_gpu_learner_thread.py", line 
162, in step
    default_policy_results = policy.learn_on_loaded_batch(
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 811, in 
learn_on_loaded_batch
    tower_outputs = self._multi_gpu_parallel_grad_calc(device_batches)
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 1320, in 
_multi_gpu_parallel_grad_calc
    raise last_result[0] from last_result[1]
ValueError: Expected parameter loc (Tensor of shape (256, 1)) of distribution Normal(loc: torch.Size([256, 1]), scale: 
torch.Size([256, 1])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
Training iteration 0
Training iteration 1
Training iteration 2
Training iteration 3
Training iteration 4
Training iteration 5
Training iteration 6
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
Training iteration 7
Training iteration 8
Training iteration 9
Training iteration 10
Training iteration 11
Training iteration 12
Training iteration 13
Training iteration 14
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan]], device='cuda:0', grad_fn=<SplitBackward0>)
 tracebackTraceback (most recent call last):
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 1235, in 
_worker
    self.loss(model, self.dist_class, sample_batch)
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\rllib\algorithms\impala\impala_torch_policy.py", 
line 235, in loss
    action_dist = dist_class(model_out, model)
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\rllib\models\torch\torch_action_dist.py", line 250, 
in __init__
    self.dist = torch.distributions.normal.Normal(mean, torch.exp(log_std))
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\torch\distributions\normal.py", line 56, in __init__
    super(Normal, self).__init__(batch_shape, validate_args=validate_args)
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\torch\distributions\distribution.py", line 56, in 
__init__
    raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (256, 1)) of distribution Normal(loc: torch.Size([256, 1]), scale: 
torch.Size([256, 1])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan]], device='cuda:0', grad_fn=<SplitBackward0>)

In tower 0 on device cuda:0
Traceback (most recent call last):
  File ".\src\impala_mountain_car.py", line 22, in <module>
    result = algo.train()
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\tune\trainable\trainable.py", line 389, in train
    raise skipped from exception_cause(skipped)
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\tune\trainable\trainable.py", line 386, in train
    result = self.step()
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 803, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2853, in 
_run_one_training_iteration
    results = self.training_step()
  File "C:\Users\ULudo\miniconda3\envs\drl_env\lib\site-packages\ray\rllib\algorithms\impala\impala.py", line 683, in 
training_step
    raise RuntimeError("The learner thread died while training!")
RuntimeError: The learner thread died while training!

Versions / Dependencies

Setup

Reproduction script

Environment setup conda create -n test_env python=3.8 ipython conda activate test_env pip install "ray[rllib]" pip install absl-py>=0.6.1 pip install gin-config==0.1.3 pip install tensorflow-probability>=0.9.0 pip install tensorflow conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia pip install chardet

Reproduction script

from ray.rllib.algorithms.impala import ImpalaConfig
from ray.rllib.algorithms import Algorithm
import gymnasium as gym

env_name = "MountainCarContinuous-v0"

config = ImpalaConfig()
config = config.training(gamma=0.99, lr=0.0003, train_batch_size=256)
config = config.framework(framework="torch")
config = config.resources(num_gpus=1)
config = config.rollouts(num_rollout_workers=1, num_envs_per_worker=1, rollout_fragment_length=256)
config = config.environment(env=env_name)

algo:Algorithm = config.build()

eval_returns = []
for i in range(1000):
    print(f"Training iteration {i}")
    result = algo.train()

Additional Information Using the following additional settings delays the occurrence of the error a little bit:

timeout_s_sampler_manager=10000,
timeout_s_aggregator_manager=10000,
learner_queue_timeout=10000

Issue Severity

High: It blocks me from completing my task.

avnishn commented 1 year ago

@ULudo do you get the same issue if you enable the following flags:

config.rl_module(_enable_rl_module_api=True) config.training(_enable_learner_api=True)

We probably won't be merging any fixes to our old impala implementation that is built on our old training stack, but the above flags enable our new training stack.

The original error is coming because for some reason NANs are being introduced into the policy gradient path, and this causes an error to occur during training. Training is occuring on a python thread so the base error will be something like "the learner thread died while training" since there was an error on training code on the thread.

We don't have this setup with our new training stack, and we use a whole different set of models. We added in some tricks for stability to the RLModules used for IMPALA/APPO, so give that a try first and then let me know if this helps.

ULudo commented 1 year ago

Thank you for your reply, it took me some time to test this. Using these two configuration flags helps, but the error still occurs, just at a later training point.

Script:

from ray.rllib.algorithms.impala import ImpalaConfig
from ray.rllib.algorithms import Algorithm

import gymnasium as gym

env_name = "MountainCarContinuous-v0"

config = ImpalaConfig()
config = config.training(gamma=0.99, lr=0.0003, train_batch_size=256, _enable_learner_api=True)
config = config.framework(framework="torch")
config = config.resources(num_gpus=1)
config = config.rollouts(num_rollout_workers=1, num_envs_per_worker=1, rollout_fragment_length=256)
config = config.environment(env=env_name)
config = config.rl_module(_enable_rl_module_api=True)

algo:Algorithm = config.build()

eval_returns = []
for i in range(1000):
    print(f"Training iteration {i}")
    result = algo.train()

Output:

python.exe : 2023-06-28 07:20:41,522    WARNING algorithm_config.py:2451 -- Setting `exploration_config={}` because you 
set `_enable_rl_modules=True`. When RLModule API are enabled, exploration_config can not be set. If you want to 
implement custom exploration behaviour, please modify the `forward_exploration` method of the RLModule at hand. On 
configs that have a default exploration config, this must be done with `config.exploration_config={}`.
At line:1 char:1
+ python.exe .\src\impala_mountain_car.py *>&1 > .\train_mc_res.txt
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (2023-06-28 07:2...ion_config={}`.:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\rllib\algorithms\algorithm.py:442: RayDeprecationWarning: 
This API is deprecated and may be removed in future Ray releases. You could suppress this warning by setting env 
variable PYTHONWARNINGS="ignore::DeprecationWarning"
`UnifiedLogger` will be removed in Ray 2.7.
  return UnifiedLogger(config, logdir, loggers=None)
C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\tune\logger\unified.py:53: RayDeprecationWarning: This API 
is deprecated and may be removed in future Ray releases. You could suppress this warning by setting env variable 
PYTHONWARNINGS="ignore::DeprecationWarning"
The `JsonLogger interface is deprecated in favor of the `ray.tune.json.JsonLoggerCallback` interface and will be 
removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\tune\logger\unified.py:53: RayDeprecationWarning: This API 
is deprecated and may be removed in future Ray releases. You could suppress this warning by setting env variable 
PYTHONWARNINGS="ignore::DeprecationWarning"
The `CSVLogger interface is deprecated in favor of the `ray.tune.csv.CSVLoggerCallback` interface and will be removed 
in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\tune\logger\unified.py:53: RayDeprecationWarning: This API 
is deprecated and may be removed in future Ray releases. You could suppress this warning by setting env variable 
PYTHONWARNINGS="ignore::DeprecationWarning"
The `TBXLogger interface is deprecated in favor of the `ray.tune.tensorboardx.TBXLoggerCallback` interface and will be 
removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
2023-06-28 07:20:44,034 INFO worker.py:1636 -- Started a local Ray instance.
(RolloutWorker pid=8208) 
C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\gymnasium\utils\passive_env_checker.py:233: 
DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`.  (Deprecated NumPy 1.24)
(RolloutWorker pid=8208)   if not isinstance(terminated, (bool, np.bool8)):
2023-06-28 07:20:52,571 INFO trainable.py:173 -- Trainable.setup took 11.018 seconds. If your trainable is slow to 
initialize, consider setting reuse_actors=True to reduce actor creation overheads.
2023-06-28 07:20:52,571 WARNING util.py:68 -- Install gputil for GPU system monitoring.
Training iteration 0
Training iteration 1
...
Training iteration 928
Training iteration 929
Traceback (most recent call last):
  File ".\src\impala_mountain_car.py", line 23, in <module>
    result = algo.train()
  File "C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\tune\trainable\trainable.py", line 389, in train
    raise skipped from exception_cause(skipped)
  File "C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\tune\trainable\trainable.py", line 386, in train
    result = self.step()
  File "C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 803, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2853, in 
_run_one_training_iteration
    results = self.training_step()
  File "C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\rllib\algorithms\impala\impala.py", line 720, in 
training_step
    train_results = self.learn_on_processed_samples()
  File "C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\rllib\algorithms\impala\impala.py", line 960, in 
learn_on_processed_samples
    result = self.learner_group.update(
  File "C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\rllib\core\learner\learner_group.py", line 198, in 
update
    self._learner.update(
  File "C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\rllib\core\learner\learner.py", line 864, in update
    result = self._update(minibatch)
  File "C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\rllib\core\learner\learner.py", line 1095, in 
_update
    loss = self.compute_loss(fwd_out=fwd_out, batch=tensorbatch)
  File "C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\rllib\core\learner\learner.py", line 677, in 
compute_loss
    module_results = self.compute_loss_per_module(
  File 
"C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\rllib\algorithms\impala\torch\impala_torch_learner.py", 
line 29, in compute_loss_per_module
    target_policy_dist = action_dist_class_train.from_logits(
  File "C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\rllib\models\torch\torch_distributions.py", line 
217, in from_logits
    return TorchDiagGaussian(loc=loc, scale=scale)
  File "C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\rllib\models\torch\torch_distributions.py", line 
189, in __init__
    super().__init__(loc=loc, scale=scale)
  File "C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\rllib\models\torch\torch_distributions.py", line 
27, in __init__
    self._dist = self._get_torch_distribution(*args, **kwargs)
  File "C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\ray\rllib\models\torch\torch_distributions.py", line 
192, in _get_torch_distribution
    return torch.distributions.normal.Normal(loc, scale)
  File "C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\torch\distributions\normal.py", line 56, in __init__
    super().__init__(batch_shape, validate_args=validate_args)
  File "C:\Users\ULudo\miniconda3\envs\test_env\lib\site-packages\torch\distributions\distribution.py", line 62, in 
__init__
    raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (256, 1)) of distribution Normal(loc: torch.Size([256, 1]), scale: 
torch.Size([256, 1])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan]], grad_fn=<SplitBackward0>)
lyzyn commented 10 months ago

I have also encountered this problem. Have you resolved it? python.exe : 2023-06-28 07:20:41,522 WARNING algorithm_config.py:2451 -- Setting exploration_config={} because you set _enable_rl_modules=True. When RLModule API are enabled, exploration_config can not be set. If you want to implement custom exploration behaviour, please modify the forward_exploration method of the RLModule at hand. On configs that have a default exploration config, this must be done with config.exploration_config={}.

jesuspc commented 10 months ago

Having this problem as well on RLLib 2.7.1. After a few training iterations gradients go to nan:

(APPO pid=2911932) ValueError: Expected parameter loc (Tensor of shape (500, 10)) of distribution Normal(loc: torch.Size([500, 10]), scale: torch.Size([500, 10])) to satisfy the constraint Real(), but found invalid values:
(APPO pid=2911932) tensor([[nan, nan, nan,  ..., nan, nan, nan],
(APPO pid=2911932)         [nan, nan, nan,  ..., nan, nan, nan],
(APPO pid=2911932)         [nan, nan, nan,  ..., nan, nan, nan],
(APPO pid=2911932)         ...,
(APPO pid=2911932)         [nan, nan, nan,  ..., nan, nan, nan],
(APPO pid=2911932)         [nan, nan, nan,  ..., nan, nan, nan],
(APPO pid=2911932)         [nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',
(APPO pid=2911932)        grad_fn=<SplitBackward0>)