[rllib] output shape inconsistent

efang96 commented 5 years ago

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04.5
Ray installed from (source or binary): https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.5.3-cp35-cp35m-manylinux1_x86_64.whl
Ray version: 0.5.3
Python version: 3.5
Exact command to reproduce:

Describe the problem

Coming across an interesting bug when using ray.tune. I am defining a custom model which has a function def _build_layers(self, inputs, num_outputs, options). When printing out num_outputs to console, I am seeing that it is alternating between 2 (the correct, expected shape) and 1 (incorrect shape). I'm making sure my action_space is of shape 2. The actual error being thrown is expected tensor shape (?, 1) but got (1, 2).

Source code / logs

I am running the following command python train_rl.py. Below are the important files involved.

train_rl.py

import fluids
import ray
import argparse
import os
import numpy as np
import custom_models
from ray import tune
from qlidar_env import FluidsQLidarEnv
from model import registry
from ray.tune.registry import register_env

def env_creator(env_config):
    return FluidsQLidarEnv(**default_configuration())

def default_configuration():
    return {
        "visualization_level": 0,
        "action_type": fluids.actions.SteeringVelAction,
        "beam_distribution": np.linspace(-1, 1, 17),
        "num_cars": 1,
        "num_pedestrians": 15,
        "num_background_cars": 15
    }

if __name__ == "__main__":
    ray.init(use_raylet=True, redis_password=os.urandom(128).hex())
    parser = argparse.ArgumentParser()
    parser.add_argument("--checkpoint", type=str, help="Path to checkpoint")
    args = parser.parse_args()
    checkpoint = args.checkpoint
    register_env("FluidsQLidarEnv", env_creator)

    experiment_spec = {
        "fluids_qlidar": {
            "run": "PPO",
            "env": "FluidsQLidarEnv",
            "restore": checkpoint,
            "config": {
                "model": {
                    "custom_model": "LidarConv",
                }
            },
            "trial_resources":{
                "cpu": 10,
                "gpu": 1,
            },
            "checkpoint_freq": 10,
        },
    }
    tune.run_experiments(experiment_spec)

custom_models.py

import tensorflow as tf
from tensorflow import layers

from ray.rllib.models import ModelCatalog, Model
from ray.rllib.models.misc import flatten, normc_initializer

class LidarConv(Model):
    def _build_layers(self, inputs, num_outputs, options):
        print("num_outputs: ", num_outputs)
        with tf.name_scope("1DConv"):
            last_layer = tf.transpose(inputs, [0, 2, 1])
            last_layer = tf.layers.conv1d(last_layer, 8, 2, activation=tf.nn.relu, name="conv1d_1")
            last_layer = tf.layers.conv1d(last_layer, 16, 2, activation=tf.nn.relu, name="conv1d_2")

            last_layer = tf.layers.flatten(last_layer)
            last_layer = tf.layers.dense(last_layer, 64, activation=tf.nn.relu, name="dense1")
            last_layer = tf.layers.dense(last_layer, 64, activation=tf.nn.relu, name="dense2")

            output = tf.layers.dense(last_layer, num_outputs, activation=None, name="dense_output")
            return output, last_layer

ModelCatalog.register_custom_model("LidarConv", LidarConv)

ericl commented 5 years ago

@efang96 can you post a reproduction script that can be run standalone without FluidsEnv?

ericl commented 5 years ago

Btw, it might be normal to see a requested output of 1 when GAE is enabled, since PPO will attempt to construct a value function with the same model configuration but with 1 output.

This shouldn't be raising an error though. Can you post the full stack trace? I can't find the error you mentioned.

efang96 commented 5 years ago

Thanks Eric, that makes sense. Unfortunately it does still error out though. I will work on a reproduction script without the FluidsEnv and post it later today.

The stack trace below is ran with A3C, still the same error though. The print statements for action_dim, action_space, num_outputs all print 2 (expected and correct). I skipped the other workers for brevity. Please let me know if there's anything else you need! Thanks again.

(fluids) [edward.fang@steropes:/data/efang/low-res-planning/Fluids-v0/qlidar]$ python train_rl.py
pygame 1.9.4
Hello from the pygame community. https://www.pygame.org/contribute.html
Process STDOUT and STDERR is being redirected to /tmp/ray/session_2018-10-21_23-49-13_21607/logs.
Waiting for redis server at 127.0.0.1:37210 to respond...
Waiting for redis server at 127.0.0.1:11471 to respond...
Starting the Plasma object store with 108.15 GB memory.

======================================================================
View the web UI at http://localhost:8893/notebooks/ray_ui.ipynb?token=aa910d58cd59de3b4431cc6189086b066fd35322b3178332
======================================================================

== Status ==
Using FIFO scheduling algorithm.

Created LogSyncer for /home/eecs/edward.fang/ray_results/fluids_qlidar/A3C_FluidsQLidarEnv_0_2018-10-21_23-49-141tcmw8oh ->
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 3/24 CPUs, 0/8 GPUs
Result logdir: /home/eecs/edward.fang/ray_results/fluids_qlidar
RUNNING trials:
 - A3C_FluidsQLidarEnv_0:   RUNNING

pygame 1.9.4
Hello from the pygame community. https://www.pygame.org/contribute.html
action_dim:  2
action_space:  Box(1, 2)
2018-10-21 23:49:57.199615: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-10-21 23:49:57.348582: E tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2018-10-21 23:49:57.348636: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] retrieving CUDA diagnostic information for host: steropes
2018-10-21 23:49:57.348650: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:170] hostname: steropes
2018-10-21 23:49:57.348696: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:194] libcuda reported version is: 410.48.0
2018-10-21 23:49:57.348742: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:198] kernel reported version is: 410.48.0
2018-10-21 23:49:57.348754: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:305] kernel version seems to match DSO: 410.48.0
Using custom model LidarConv
num_outputs:  2
[FLUIDS] Loading layout: fluids_state_city
[FLUIDS] Cached layout found
[FLUIDS] Creating objects
[FLUIDS] Generating trajectory map
[FLUIDS] Generating cars
[FLUIDS] Generating peds
[FLUIDS] State creation complete
*** WARNING ***: no episode horizon specified, assuming inf
Error fetching: [<tf.Tensor 'default/add_4:0' shape=(?, 1) dtype=float32>, {'vf_preds': <tf.Tensor 'default/Reshape:0' shape=(?,) dtype=float32>}], feed_dict={<tf.Tensor 'default/Placeholder:0' shape=(?, 1, 17) dtype=float32>: [array([[200.        , 165.20196824,  77.59682208,  56.30913861,
         50.18209912,  52.46046702,  65.38294434, 108.71226748,
        200.        , 200.        ,  77.59682208,  56.30913861,
         50.18209912,  52.46046702,  65.38294434, 108.71226748,
        200.        ]])], <tf.Tensor 'default/action:0' shape=(?, 1) dtype=float32>: [array([[0., 0.]], dtype=float32)], <tf.Tensor 'default/PlaceholderWithDefault:0' shape=() dtype=bool>: True, <tf.Tensor 'default/prev_reward:0' shape=(?,) dtype=float32>: [0.0]}
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/data/efang/anaconda3/envs/fluids/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/data/efang/anaconda3/envs/fluids/lib/python3.5/site-packages/ray/rllib/evaluation/sampler.py", line 125, in run
    raise e
  File "/data/efang/anaconda3/envs/fluids/lib/python3.5/site-packages/ray/rllib/evaluation/sampler.py", line 122, in run
    self._run()
  File "/data/efang/anaconda3/envs/fluids/lib/python3.5/site-packages/ray/rllib/evaluation/sampler.py", line 136, in _run
    item = next(rollout_provider)
  File "/data/efang/anaconda3/envs/fluids/lib/python3.5/site-packages/ray/rllib/evaluation/sampler.py", line 378, in _env_runner
    eval_results[k] = builder.get(v)
  File "/data/efang/anaconda3/envs/fluids/lib/python3.5/site-packages/ray/rllib/utils/tf_run_builder.py", line 48, in get
    raise e
  File "/data/efang/anaconda3/envs/fluids/lib/python3.5/site-packages/ray/rllib/utils/tf_run_builder.py", line 44, in get
    self.feed_dict, os.environ.get("TF_TIMELINE_DIR"))
  File "/data/efang/anaconda3/envs/fluids/lib/python3.5/site-packages/ray/rllib/utils/tf_run_builder.py", line 83, in run_timeline
    fetches = sess.run(ops, feed_dict=feed_dict)
  File "/data/efang/anaconda3/envs/fluids/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 887, in run
    run_metadata_ptr)
  File "/data/efang/anaconda3/envs/fluids/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1086, in _run
    str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 1, 2) for Tensor 'default/action:0', which has shape '(?, 1)'

Worker ip unknown, skipping log sync for /home/eecs/edward.fang/ray_results/fluids_qlidar/A3C_FluidsQLidarEnv_0_2018-10-21_23-49-141tcmw8oh
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/24 CPUs, 0/8 GPUs
Result logdir: /home/eecs/edward.fang/ray_results/fluids_qlidar
ERROR trials:
 - A3C_FluidsQLidarEnv_0:   ERROR, 1 failures: /home/eecs/edward.fang/ray_results/fluids_qlidar/A3C_FluidsQLidarEnv_0_2018-10-21_23-49-141tcmw8oh/error_2018-10-21_23-50-32.txt

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/24 CPUs, 0/8 GPUs
Result logdir: /home/eecs/edward.fang/ray_results/fluids_qlidar
ERROR trials:
 - A3C_FluidsQLidarEnv_0:   ERROR, 1 failures: /home/eecs/edward.fang/ray_results/fluids_qlidar/A3C_FluidsQLidarEnv_0_2018-10-21_23-49-141tcmw8oh/error_2018-10-21_23-50-32.txt

Traceback (most recent call last):
  File "train_rl.py", line 52, in <module>
    tune.run_experiments(experiment_spec)
  File "/data/efang/anaconda3/envs/fluids/lib/python3.5/site-packages/ray/tune/tune.py", line 124, in run_experiments
    raise TuneError("Trials did not complete", errored_trials)
ray.tune.error.TuneError: ('Trials did not complete', [A3C_FluidsQLidarEnv_0])

efang96 commented 5 years ago

Ok I've reconstructed this without fluids.

import ray
import argparse
import os
import numpy as np
import gym
import tensorflow as tf

from ray import tune
from model import registry
from ray.tune.registry import register_env
from tensorflow import layers
from ray.rllib.models import ModelCatalog, Model
from ray.rllib.models.misc import flatten, normc_initializer

class LidarConv(Model):
    def _build_layers(self, inputs, num_outputs, options):
        print("num_outputs: ", num_outputs)
        with tf.name_scope("1DConv"):
            last_layer = tf.transpose(inputs, [0, 2, 1])
            last_layer = tf.layers.conv1d(last_layer, 8, 2, activation=tf.nn.relu, name="conv1d_1")
            last_layer = tf.layers.conv1d(last_layer, 16, 2, activation=tf.nn.relu, name="conv1d_2")

            last_layer = flatten(last_layer)
            last_layer = tf.layers.dense(last_layer, 64, activation=tf.nn.relu, name="dense1")
            last_layer = tf.layers.dense(last_layer, 64, activation=tf.nn.relu, name="dense2")

            output = tf.layers.dense(last_layer, num_outputs, activation=None, name="dense_output")
            return output, last_layer

ModelCatalog.register_custom_model("LidarConv", LidarConv)

class CustomEnv(gym.Env):
    def __init__(self):
        self.observation_space = gym.spaces.Box(
            low=0.0,
            high=1.0,
            shape=(1, 17),
            dtype=np.float32)

        self.action_space = gym.spaces.Box(
            low=-1.0,
            high=+1.0,
            shape=(1, 2),
            dtype=np.float32)

    def reset(self):
        return np.zeros((1,17))

    def step(self, action):
        return np.zeros((1,17)), 0, False, {}

def env_creator(env_config):
    return CustomEnv()

if __name__ == "__main__":
    ray.init(use_raylet=True, redis_password=os.urandom(128).hex())
    parser = argparse.ArgumentParser()
    parser.add_argument("--checkpoint", type=str, help="Path to checkpoint")
    args = parser.parse_args()
    checkpoint = args.checkpoint
    register_env("CustomEnv", env_creator)

    experiment_spec = {
        "custom_env": {
            "run": "A3C",
            "env": "CustomEnv",
            "restore": checkpoint,
            "config": {
                "model": {
                    "custom_model": "LidarConv",
                },
            },
#            "trial_resources":{
#                "cpu": 10,
#                "gpu": 1,
#            },
            "checkpoint_freq": 10,
        },
    }
    tune.run_experiments(experiment_spec)

ericl commented 5 years ago

shape=(1, 2)

Is this intentional? Note that you should have a shape of (2) if you want two actions, (1, 2) just adds a empty dimension.

efang96 commented 5 years ago

Yeah it's intentional, in some cases we will have multiple agents so the shape will be (N, 2). Right now it's just hardcoded for 1 agent.

ericl commented 5 years ago

I see, did this use to work?

On Tue, Oct 23, 2018, 11:19 AM efang96 notifications@github.com wrote:

Yeah it's intentional, in some cases we will have multiple agents so the shape will be (N, 2). Right now it's just hardcoded for 1 agent.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/3111#issuecomment-432361373, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA6SukFwS4VUGS0RwE5sbbaPVU7I4phks5un13EgaJpZM4X0nQx .

ericl commented 5 years ago

Actually, this won't really work out due to complications with the action distribution. There is no semantic meaning given to those extra dimensions, so you'd be better off using a Tuple action space in this case (or using the full-blown multi-agent API).

This patch adds a better error message suggesting this: https://github.com/ray-project/ray/pull/3119/files

efang96 commented 5 years ago

Got it, thanks! I'll test this out and let you know.

ray-project / ray