ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.3k stars 5.63k forks source link

[RLlib] A3C has problems with the horizon option removed #32812

Open tensor-works opened 1 year ago

tensor-works commented 1 year ago

What happened + What you expected to happen

I am currently running trials with the A3C algorithm in an episodic environment. Since the horizon option in the 2.3.0 build has been removed, it periodically occurs, that environments do not finish. This is only occuring with A3C and not with any other alogrithm, even actor-critic methods such as PPO do not show behaviour of this type. The result are lost trials which cannot be evaluated and a waisting of compuational resources.

This is an example of an eroneous run: image

This is my step function:

    def step(self, action_dict: MultiAgentDict) -> \
        Tuple[MultiAgentDict, MultiAgentDict, MultiAgentDict, MultiAgentDict]:
        """This function processes a step every time called.
        All attributes of the agents and the environment are adapted accordingly.

        :param action_dict: MultiAgentDict - Dictionary as shown below:
            {0: 0,
             1: 3,
             2: 2,...
            ...}
            Contains all agent's ids and their chosen actions for this timestep, if they are still allowed to move.

        :return:
            obs     = MultiAgentDict:   Key = Agent_ID, Value = Observation of agent
            reward  = MultiAgentDict:   Key = Agent_ID, Value = Reward of agent
            dones   = MultiAgentDict:   Key = Agent_ID, Value = Boolean which shows if the agent is finished
            infos   = MultiAgentDict:   Key = Agent_ID, Value = None
        """
        if self._step_counter == 0:
            print("Environment starting")

        # clear obs, reward, info
        self._clear_interactions()

        # add chosen action to current position
        self.obs = dict()
        self.rewards = dict()
        self.infos = dict()

        for agent_id, action_id in action_dict.items():
            self.obs[agent_id], self.rewards[agent_id], self.dones[agent_id], = self._agent_dict[
                agent_id].update(Action(action_id, self._non_action_allowed))
            self.infos[agent_id] = {}

        self._step_counter += 1
        self._episode_reward += sum(self.rewards.values())

        if all({key: value for key, value in self.dones.items() if not key == "__all__"}.values()):
            self.dones["__all__"] = True
        if self._step_counter >= self._max_steps_per_episode:
            self.truncateds = {agent_id: True for agent_id in self._agent_ids}
            self.truncateds["__all__"] = True
        if "qmix" in self._observation_type:
            self.create_state()
        return self.obs, self.rewards, self.dones, self.truncateds, self.infos

This is my reset function:

    def reset(self, *, seed=None, options=None) -> MultiAgentDict:
        """After finishing a training-iteration, the environment can be reset with this function.
        Therefore all counters and the dicts are cleaned.

        :return: current observation for all agents in a MultiAgentDict
        """
        self._step_counter = 0
        self._episode_reward = 0

        self._init_agents()
        self._init_interactions()
        self._init_rendering()

        self._state_space_vxl.reset()
        infos = {agent_id: {} for agent_id in self._agent_ids}
        self.obs = {agent_id: agent.observe() for agent_id, agent in self._agent_dict.items()}
        if "qmix" in self._observation_type:
            self.create_state()
        return self.obs, infos

Please not, that I emualte the functioning of the gymnasiums TimeLimit wrapper, which is advertised as solution for this issue. However I am using a multiagent env, while the TimeLimit wrapper is for single agent envs only. I am still not fully confident of not confident that this is maybe a mistake I am making myself.

This is a parameter.json from an erroneous run:

{
  "_disable_action_flattening": false,
  "_disable_execution_plan_api": true,
  "_disable_preprocessor_api": false,
  "_enable_rl_module_api": false,
  "_enable_rl_trainer_api": false,
  "_fake_gpus": false,
  "_rl_trainer_hps": "RLTrainerHPs()",
  "_tf_policy_handles_more_than_one_loss": false,
  "action_space": null,
  "actions_in_input_normalized": false,
  "always_attach_evaluation_results": false,
  "auto_wrap_old_gym_envs": true,
  "batch_mode": "complete_episodes",
  "callbacks": "<class 'ray.rllib.algorithms.callbacks.DefaultCallbacks'>",
  "checkpoint_trainable_policies_only": false,
  "clip_actions": false,
  "clip_rewards": null,
  "compress_observations": false,
  "create_env_on_driver": false,
  "custom_eval_function": null,
  "custom_resources_per_worker": {},
  "disable_env_checking": false,
  "eager_max_retraces": 20,
  "eager_tracing": false,
  "enable_async_evaluation": false,
  "enable_connectors": true,
  "enable_tf1_exec_eagerly": false,
  "entropy_coeff": 1.6732711160768008e-96,
  "entropy_coeff_schedule": null,
  "env": "ma_routing",
  "env_config": {},
  "env_task_fn": null,
  "evaluation_config": null,
  "evaluation_duration": 1,
  "evaluation_duration_unit": "episodes",
  "evaluation_interval": 8,
  "evaluation_num_workers": 2,
  "evaluation_parallel_to_training": false,
  "evaluation_sample_timeout_s": 180.0,
  "exploration_config": {
    "type": "StochasticSampling"
  },
  "explore": true,
  "export_native_model_files": false,
  "extra_python_environs_for_driver": {},
  "extra_python_environs_for_worker": {},
  "fake_sampler": false,
  "framework": "torch",
  "gamma": 0.6323546212384322,
  "grad_clip": 64.25222715584181,
  "horizon": -1,
  "ignore_worker_failures": true,
  "in_evaluation": false,
  "input": "sampler",
  "input_config": {},
  "is_atari": null,
  "keep_per_episode_custom_metrics": false,
  "lambda": 0.9692349532319025,
  "local_tf_session_args": {
    "inter_op_parallelism_threads": 8,
    "intra_op_parallelism_threads": 8
  },
  "log_level": "INFO",
  "log_sys_usage": true,
  "logger_config": null,
  "logger_creator": null,
  "lr": 0.00017192339487967932,
  "lr_schedule": null,
  "max_requests_in_flight_per_sampler_worker": 2,
  "metrics_episode_collection_timeout_s": 60.0,
  "metrics_num_episodes_for_smoothing": 100,
  "min_sample_timesteps_per_iteration": 0,
  "min_time_s_per_iteration": 5,
  "min_train_timesteps_per_iteration": 0,
  "model": {
    "_disable_action_flattening": false,
    "_disable_preprocessor_api": false,
    "_time_major": false,
    "_use_default_native_models": -1,
    "attention_dim": 64,
    "attention_head_dim": 32,
    "attention_init_gru_gate_bias": 2.0,
    "attention_memory_inference": 50,
    "attention_memory_training": 50,
    "attention_num_heads": 1,
    "attention_num_transformer_units": 1,
    "attention_position_wise_mlp_dim": 32,
    "attention_use_n_prev_actions": 0,
    "attention_use_n_prev_rewards": 0,
    "conv_activation": "relu",
    "conv_filters": null,
    "custom_action_dist": null,
    "custom_model": "cc_network",
    "custom_model_config": {
      "critic_fcnet_layers": [
        430,
        244
      ],
      "num_agents": 5
    },
    "custom_preprocessor": null,
    "dim": 84,
    "fcnet_activation": "tanh",
    "fcnet_hiddens": [
      400,
      384
    ],
    "framestack": true,
    "free_log_std": false,
    "grayscale": false,
    "lstm_cell_size": 256,
    "lstm_use_prev_action": false,
    "lstm_use_prev_action_reward": -1,
    "lstm_use_prev_reward": false,
    "max_seq_len": 20,
    "no_final_linear": false,
    "post_fcnet_activation": "relu",
    "post_fcnet_hiddens": [],
    "use_attention": false,
    "use_lstm": false,
    "vf_share_layers": true,
    "zero_mean": false
  },
  "multiagent": {
    "count_steps_by": "env_steps",
    "observation_fn": null,
    "policies": {
      "shared_policy": [
        null,
        "Box(-1.0, 1.0, (71,), float32)",
        "Discrete(26)",
        null
      ]
    },
    "policies_to_train": null,
    "policy_map_cache": -1,
    "policy_map_capacity": 100,
    "policy_mapping_fn": "<function RLOptimizer._init_algorithm_config.<locals>.<lambda> at 0x00000251B70F8A60>"
  },
  "no_done_at_end": -1,
  "normalize_actions": true,
  "num_agents": 5,
  "num_consecutive_worker_failures_tolerance": 100,
  "num_cpus_for_driver": 1,
  "num_cpus_per_trainer_worker": 1,
  "num_cpus_per_worker": 1,
  "num_envs_per_worker": 1,
  "num_gpus": 0.18,
  "num_gpus_per_trainer_worker": 0,
  "num_gpus_per_worker": 0,
  "num_trainer_workers": 0,
  "num_workers": 4,
  "observation_filter": "NoFilter",
  "observation_space": null,
  "off_policy_estimation_methods": {},
  "offline_sampling": false,
  "ope_split_batch_by_episode": true,
  "optimizer": {},
  "output": null,
  "output_compress_columns": [
    "obs",
    "new_obs"
  ],
  "output_config": {},
  "output_max_file_size": 67108864,
  "placement_strategy": "PACK",
  "policies": {
    "shared_policy": "<ray.rllib.policy.policy.PolicySpec object at 0x00000251AFCC63A0>"
  },
  "policy_states_are_swappable": "<ray.rllib.utils.from_config._NotProvided object at 0x00000251AFCC6A00>",
  "postprocess_inputs": false,
  "preprocessor_pref": "deepmind",
  "recreate_failed_workers": false,
  "remote_env_batch_wait_ms": 0,
  "remote_worker_envs": false,
  "render_env": false,
  "replay_sequence_length": null,
  "restart_failed_sub_environments": false,
  "rl_module_class": null,
  "rl_trainer_class": null,
  "rollout_fragment_length": 10,
  "sample_async": false,
  "sample_collector": "<class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>",
  "sampler_perf_stats_ema_coef": null,
  "seed": null,
  "shuffle_buffer_size": 0,
  "simple_optimizer": -1,
  "soft_horizon": -1,
  "sync_filters_on_rollout_workers_timeout_s": 60.0,
  "synchronize_filters": true,
  "tf_session_args": {
    "allow_soft_placement": true,
    "device_count": {
      "CPU": 1
    },
    "gpu_options": {
      "allow_growth": true
    },
    "inter_op_parallelism_threads": 2,
    "intra_op_parallelism_threads": 2,
    "log_device_placement": false
  },
  "train_batch_size": 29,
  "use_critic": true,
  "use_gae": true,
  "validate_workers_after_construction": true,
  "vf_loss_coeff": 0.5735665650701284,
  "worker_cls": null,
  "worker_health_probe_timeout_s": 60,
  "worker_restore_timeout_s": 1800
}

Versions / Dependencies


_py-xgboost-mutex         2.0                       cpu_0    main
absl-py                   1.4.0                    pypi_0    pypi
aiohttp                   3.8.3                    pypi_0    pypi
aiohttp-cors              0.7.0                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
alembic                   1.8.1            py38haa95532_0    main
ansicon                   1.89.0                   pypi_0    pypi
apptools                  5.2.0                    pypi_0    pypi
astunparse                1.6.3                    pypi_0    pypi
async-timeout             4.0.2                    pypi_0    pypi
atomicwrites              1.4.0                      py_0    main
attrs                     22.1.0           py38haa95532_0    main
bayesian-optimization     1.4.2              pyhd8ed1ab_1    conda-forge
blas                      1.0                         mkl    main
blessed                   1.19.1                   pypi_0    pypi
blessings                 1.7                      pypi_0    pypi
blosc                     1.21.0               h19a0ad4_1    main
bottleneck                1.3.5            py38h080aedc_0    main
brotli                    1.0.9                h2bbff1b_7    main
brotli-bin                1.0.9                h2bbff1b_7    main
brotlipy                  0.7.0           py38h2bbff1b_1003    main
bzip2                     1.0.8                he774522_0    main
ca-certificates           2023.01.10           haa95532_0
cachetools                5.3.0                    pypi_0    pypi
certifi                   2022.12.7        py38haa95532_0
cffi                      1.15.1           py38h2bbff1b_3    main
cfitsio                   3.470                h2bbff1b_7    main
charls                    2.2.0                h6c2663c_0    main
charset-normalizer        2.0.4              pyhd3eb1b0_0    main
click                     8.1.3                    pypi_0    pypi
cloudpickle               2.0.0              pyhd3eb1b0_0    main
cma                       3.3.0              pyha21a80b_0    conda-forge
cmaes                     0.9.1              pyhd8ed1ab_0    conda-forge
colorama                  0.4.6            py38haa95532_0    main
colorful                  0.5.5                    pypi_0    pypi
colorlog                  5.0.1            py38haa95532_1    main
configobj                 5.0.8                    pypi_0    pypi
contourpy                 1.0.5            py38h59b6b97_0    main
cryptography              38.0.4           py38h21b164f_0    main
cuda                      11.6.1                        0    nvidia
cuda-cccl                 11.6.55                       0    nvidia
cuda-command-line-tools   11.6.2                        0    nvidia
cuda-compiler             11.6.2                        0    nvidia
cuda-cudart               11.6.55                       0    nvidia
cuda-cudart-dev           11.6.55                       0    nvidia
cuda-cuobjdump            11.6.124                      0    nvidia
cuda-cupti                11.6.124                      0    nvidia
cuda-cuxxfilt             11.6.124                      0    nvidia
cuda-libraries            11.6.1                        0    nvidia
cuda-libraries-dev        11.6.1                        0    nvidia
cuda-memcheck             11.8.86                       0    nvidia
cuda-nsight-compute       12.0.1                        0    nvidia
cuda-nvcc                 11.6.124                      0    nvidia
cuda-nvdisasm             12.0.140                      0    nvidia
cuda-nvml-dev             11.6.55                       0    nvidia
cuda-nvprof               12.0.146                      0    nvidia
cuda-nvprune              11.6.124                      0    nvidia
cuda-nvrtc                11.6.124                      0    nvidia
cuda-nvrtc-dev            11.6.124                      0    nvidia
cuda-nvtx                 11.6.124                      0    nvidia
cuda-nvvp                 12.0.146                      0    nvidia
cuda-runtime              11.6.1                        0    nvidia
cuda-sanitizer-api        12.0.140                      0    nvidia
cuda-toolkit              11.6.1                        0    nvidia
cuda-tools                11.6.1                        0    nvidia
cuda-visual-tools         11.6.1                        0    nvidia
cycler                    0.11.0             pyhd3eb1b0_0    main
cytoolz                   0.12.0           py38h2bbff1b_0    main
dask-core                 2022.7.0         py38haa95532_0    main
decorator                 5.1.1                    pypi_0    pypi
distlib                   0.3.6                    pypi_0    pypi
dm-tree                   0.1.8                    pypi_0    pypi
eigen                     3.3.7                h59b6b97_1    main
envisage                  6.1.0                    pypi_0    pypi
fftw                      3.3.9                h2bbff1b_1    main
filelock                  3.9.0                    pypi_0    pypi
flaml                     1.1.1              pyhd8ed1ab_0    conda-forge
flatbuffers               23.1.21                  pypi_0    pypi
flit-core                 3.6.0              pyhd3eb1b0_0    main
fonttools                 4.25.0             pyhd3eb1b0_0    main
freetype                  2.12.1               ha860e81_0    main
frozenlist                1.3.3                    pypi_0    pypi
fsspec                    2022.11.0        py38haa95532_0    main
future                    0.18.2                   py38_1    main
gast                      0.4.0                    pypi_0    pypi
geos                      3.8.0                h33f27b4_0    main
giflib                    5.2.1                h8cc25b3_1    main
glib                      2.69.1               h5dc1a3c_2    main
google-api-core           2.11.0                   pypi_0    pypi
google-auth               2.16.0                   pypi_0    pypi
google-auth-oauthlib      0.4.6                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
googleapis-common-protos  1.58.0                   pypi_0    pypi
gpustat                   1.0.0                    pypi_0    pypi
greenlet                  2.0.1            py38hd77b12b_0    main
grpcio                    1.51.1                   pypi_0    pypi
gst-plugins-base          1.18.5               h9e645db_0    main
gstreamer                 1.18.5               hd78058f_0    main
gym                       0.26.1           py38h23ba278_0    conda-forge
gym-notices               0.0.8              pyhd8ed1ab_0    conda-forge
gymnasium                 0.26.3                   pypi_0    pypi
gymnasium-notices         0.0.1                    pypi_0    pypi
h5py                      3.8.0                    pypi_0    pypi
hdf5                      1.12.1               h1756f20_2    main
hyperopt                  0.2.7              pyhd8ed1ab_0    conda-forge
icc_rt                    2022.1.0             h6049295_2    main
icu                       58.2                 ha925a31_3    main
idna                      3.4              py38haa95532_0    main
imagecodecs               2021.8.26        py38hc0a7faf_1    main
imageio                   2.19.3           py38haa95532_0    main
importlib-metadata        4.11.3           py38haa95532_0    main
importlib_metadata        4.11.3               hd3eb1b0_0    main
importlib_resources       5.2.0              pyhd3eb1b0_1    main
iniconfig                 1.1.1              pyhd3eb1b0_0    main
intel-openmp              2021.4.0          haa95532_3556    main
iohexperimenter           0.2.9.2          py38hbd9d945_2    conda-forge
jinxed                    1.2.0                    pypi_0    pypi
joblib                    1.1.1            py38haa95532_0    main
jpeg                      9e                   h2bbff1b_0    main
jsonschema                4.16.0           py38haa95532_0    main
keras                     2.11.0                   pypi_0    pypi
kiwisolver                1.4.4            py38hd77b12b_0    main
lcms2                     2.12                 h83e58a3_0    main
lerc                      3.0                  hd77b12b_0    main
libaec                    1.0.4                h33f27b4_1    main
libbrotlicommon           1.0.9                h2bbff1b_7    main
libbrotlidec              1.0.9                h2bbff1b_7    main
libbrotlienc              1.0.9                h2bbff1b_7    main
libclang                  15.0.6.1                 pypi_0    pypi
libcublas                 11.9.2.110                    0    nvidia
libcublas-dev             11.9.2.110                    0    nvidia
libcufft                  10.7.1.112                    0    nvidia
libcufft-dev              10.7.1.112                    0    nvidia
libcurand                 10.3.1.124                    0    nvidia
libcurand-dev             10.3.1.124                    0    nvidia
libcurl                   7.87.0               h86230a5_0    main
libcusolver               11.3.4.124                    0    nvidia
libcusolver-dev           11.3.4.124                    0    nvidia
libcusparse               11.7.2.124                    0    nvidia
libcusparse-dev           11.7.2.124                    0    nvidia
libdeflate                1.8                  h2bbff1b_5    main
libffi                    3.4.2                hd77b12b_6    main
libiconv                  1.16                 h2bbff1b_2    main
libnpp                    11.6.3.124                    0    nvidia
libnpp-dev                11.6.3.124                    0    nvidia
libnvjpeg                 11.6.2.124                    0    nvidia
libnvjpeg-dev             11.6.2.124                    0    nvidia
libogg                    1.3.5                h2bbff1b_1    main
libpng                    1.6.37               h2a8f88b_0    main
libprotobuf               3.20.1               h23ce68f_0    main
libssh2                   1.10.0               hcd4344a_0    main
libtiff                   4.5.0                h8a3f274_0    main
libuv                     1.40.0               he774522_0    main
libvorbis                 1.3.7                he774522_0    main
libwebp                   1.2.4                h2bbff1b_0    main
libwebp-base              1.2.4                h2bbff1b_0    main
libxgboost                1.5.1                hd77b12b_0    main
libxml2                   2.9.14               h0ad7f3c_0    main
libxslt                   1.1.35               h2bbff1b_0    main
libzopfli                 1.0.3                ha925a31_0    main
lightgbm                  3.2.1            py38hd77b12b_0    main
locket                    1.0.0            py38haa95532_0    main
lz4                       4.3.2                    pypi_0    pypi
lz4-c                     1.9.4                h2bbff1b_0    main
mako                      1.2.3            py38haa95532_0    main
markdown                  3.4.1                    pypi_0    pypi
markdown-it-py            2.1.0                    pypi_0    pypi
markupsafe                2.1.1            py38h2bbff1b_0    main
matplotlib                3.6.2            py38haa95532_0    main
matplotlib-base           3.6.2            py38h1094b8e_0    main
mayavi                    4.8.1                    pypi_0    pypi
mdurl                     0.1.2                    pypi_0    pypi
mixsimulator              0.3.3              pyhd8ed1ab_1    conda-forge
mkl                       2021.4.0           haa95532_640    main
mkl-service               2.4.0            py38h2bbff1b_0    main
mkl_fft                   1.3.1            py38h277e83a_0    main
mkl_random                1.2.2            py38hf11a4ad_0    main
msgpack                   1.0.4                    pypi_0    pypi
multidict                 6.0.4                    pypi_0    pypi
munkres                   1.1.4                      py_0    main
networkx                  2.8.4            py38haa95532_0    main
ninja                     1.10.2               haa95532_5    main
ninja-base                1.10.2               h6d14046_5    main
nsight-compute            2022.4.1.6                    0    nvidia
numexpr                   2.8.4            py38h5b0cc5e_0    main
numpy                     1.23.5           py38h3b20f71_0    main
numpy-base                1.23.5           py38h4da318b_0    main
nvidia-ml-py              11.495.46                pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
opencensus                0.11.1                   pypi_0    pypi
opencensus-context        0.1.3                    pypi_0    pypi
opencv                    4.6.0            py38h104de81_2    main
openjpeg                  2.4.0                h4fc8c34_0    main
openssl                   1.1.1s               h2bbff1b_0
opt-einsum                3.3.0                    pypi_0    pypi
optuna                    3.1.0              pyhd8ed1ab_0    conda-forge
packaging                 22.0             py38haa95532_0    main
pandas                    1.5.2            py38hf11a4ad_0    main
partd                     1.2.0              pyhd3eb1b0_1    main
pcre                      8.45                 hd77b12b_0    main
pillow                    9.3.0            py38hdc2b20a_1    main
pip                       22.3.1           py38haa95532_0    main
pkgutil-resolve-name      1.3.10           py38haa95532_0    main
platformdirs              2.6.2                    pypi_0    pypi
pluggy                    1.0.0            py38haa95532_1    main
ply                       3.11                     py38_0    main
proj                      8.2.1                h5ed7ab8_0    main
prometheus-client         0.16.0                   pypi_0    pypi
protobuf                  4.22.0                   pypi_0    pypi
psutil                    5.9.4                    pypi_0    pypi
py                        1.11.0             pyhd3eb1b0_0    main
py-opencv                 4.6.0                haa95532_2    main
py-spy                    0.3.14                   pypi_0    pypi
py-xgboost                1.5.1            py38haa95532_0    main
py4j                      0.10.9.3         py38haa95532_0    main
pyaml                     20.4.0             pyhd3eb1b0_0    main
pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
pycparser                 2.21               pyhd3eb1b0_0    main
pydantic                  1.10.4                   pypi_0    pypi
pyface                    7.4.4                    pypi_0    pypi
pygments                  2.14.0                   pypi_0    pypi
pyopenssl                 22.0.0             pyhd3eb1b0_0    main
pyparsing                 3.0.9            py38haa95532_0    main
pyproj                    3.3.0            py38hb622704_0    main
pyqt                      5.15.7           py38hd77b12b_0    main
pyqt5-sip                 12.11.0          py38hd77b12b_0    main
pyrsistent                0.18.0           py38h196d8e1_0    main
pysocks                   1.7.1            py38haa95532_0    main
pytest                    7.1.2            py38haa95532_0    main
python                    3.8.16               h6244533_2    main
python-dateutil           2.8.2              pyhd3eb1b0_0    main
python_abi                3.8                      2_cp38    conda-forge
pytorch                   1.13.1          py3.8_cuda11.6_cudnn8_0    pytorch
pytorch-cuda              11.6                 h867d48c_1    pytorch
pytorch-mutex             1.0                        cuda    pytorch
pytz                      2022.7           py38haa95532_0    main
pyvoxsurf                 1.0.7                    pypi_0    pypi
pywavelets                1.4.1            py38h2bbff1b_0    main
pyyaml                    6.0              py38h2bbff1b_1    main
qt-main                   5.15.2               he8e5bd7_7    main
qt-webengine              5.15.9               hb9a9bb5_5    main
qtwebkit                  5.212                h3ad3cdb_4    main
ray                       3.0.0.dev0               pypi_0    pypi
requests                  2.28.1           py38haa95532_0    main
requests-oauthlib         1.3.1                    pypi_0    pypi
rich                      13.3.1                   pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
scikit-image              0.19.3           py38hd77b12b_1    main
scikit-learn              1.2.0            py38hd77b12b_0    main
scikit-optimize           0.9.0              pyhd8ed1ab_1    conda-forge
scipy                     1.9.3            py38he11b74f_0    main
setuptools                65.6.3           py38haa95532_0    main
shapely                   1.8.4            py38h9064783_0    main
sigopt                    5.3.1                      py_0    conda-forge
sip                       6.6.2            py38hd77b12b_0    main
six                       1.16.0             pyhd3eb1b0_1    main
smart-open                6.3.0                    pypi_0    pypi
snappy                    1.1.9                h6c2663c_0    main
sqlalchemy                1.4.39           py38h2bbff1b_0    main
sqlite                    3.40.1               h2bbff1b_0    main
tabulate                  0.9.0                    pypi_0    pypi
tensorboard               2.11.2                   pypi_0    pypi
tensorboard-data-server   0.6.1                    pypi_0    pypi
tensorboard-plugin-wit    1.8.1                    pypi_0    pypi
tensorboardx              2.2                pyhd3eb1b0_0    main
tensorflow                2.11.0                   pypi_0    pypi
tensorflow-estimator      2.11.0                   pypi_0    pypi
tensorflow-intel          2.11.0                   pypi_0    pypi
tensorflow-io-gcs-filesystem 0.30.0                   pypi_0    pypi
tensorflow-probability    0.19.0                   pypi_0    pypi
termcolor                 2.2.0                    pypi_0    pypi
threadpoolctl             2.2.0              pyh0d69192_0    main
tifffile                  2021.7.2           pyhd3eb1b0_2    main
tk                        8.6.12               h2bbff1b_0    main
toml                      0.10.2             pyhd3eb1b0_0    main
tomli                     2.0.1            py38haa95532_0    main
toolz                     0.12.0           py38haa95532_0    main
torch                     1.12.1                   pypi_0    pypi
torchaudio                0.13.1                   pypi_0    pypi
torchvision               0.14.1                   pypi_0    pypi
tornado                   6.2              py38h2bbff1b_0    main
tqdm                      4.64.1           py38haa95532_0    main
traits                    6.4.1                    pypi_0    pypi
traitsui                  7.4.3                    pypi_0    pypi
trimesh                   3.18.1             pyhd8ed1ab_0    conda-forge
typer                     0.7.0                    pypi_0    pypi
typing-extensions         4.4.0            py38haa95532_0    main
typing_extensions         4.4.0            py38haa95532_0    main
urllib3                   1.26.14          py38haa95532_0    main
vc                        14.2                 h21ff451_1    main
virtualenv                20.17.1                  pypi_0    pypi
vs2015_runtime            14.27.29016          h5e58377_2    main
vtk                       9.2.5                    pypi_0    pypi
wcwidth                   0.2.6                    pypi_0    pypi
werkzeug                  2.2.2                    pypi_0    pypi
wheel                     0.37.1             pyhd3eb1b0_0    main
win_inet_pton             1.1.0            py38haa95532_0    main
wincertstore              0.2              py38haa95532_2    main
wrapt                     1.14.1                   pypi_0    pypi
xgboost                   1.5.1            py38haa95532_0    main
xz                        5.2.10               h8cc25b3_1    main
yaml                      0.2.5                he774522_0    main
yapf                      0.31.0             pyhd3eb1b0_0    main
yarl                      1.8.2                    pypi_0    pypi
zfp                       0.5.5                hd77b12b_6    main
zipp                      3.11.0           py38haa95532_0    main
zlib                      1.2.13               h8cc25b3_0    main
zstd                      1.5.2                h19a0ad4_0    main```

### Reproduction script

I am unable to provide a detailed reproduction script as I am working in company.

### Issue Severity

High: It blocks me from completing my task.
tensor-works commented 1 year ago

After looking at this a bit longer I could identfy the issue lies with the setting batch_mode=complete_episodes. In this setting, for some runs, depending on the batch size (I did not find out the exact relation here), the learner never receives the episode finish signal. I do not know if this is intended behaviour. I apologize for my poor english grammar.

sven1977 commented 1 year ago

Sorry for deprioritizing this issue! But we are very close to moving A3C (and some other algos) into a new "RLlib contrib" repo, so support for this algorithm will be very limited.

https://github.com/ray-project/rllib-contrib

zoetsekas commented 9 months ago

Hi @sven1977 when I install ray

pip install -U "ray[all]"==2.9.0

I don't see the "rllib-contrib" to be part of the installation