openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.72k stars 4.87k forks source link

Concat action and observation #986

Open marysavari opened 5 years ago

marysavari commented 5 years ago

Hi, I am getting error for concatenation of action and observation in the critic class from the following line: x = tf.concat([obs, action], axis=-1) # this assumes observation and action can be concatenated

The error is because they have different rank ValueError: Shape must be rank 4 but is rank 2 for 'critic/concat' (op: 'ConcatV2') with input shapes: [?,96,96,3], [?,3], [].

I tried to reshape action but I got error of cannot reshape array of size 1. My action dimension is (?, 3) and my observation dimension is (?, 96, 96, 3). Any suggestion?

DanielTakeshi commented 5 years ago

What command did you run? Without knowing the details we can't help.

The shape of [?,96,96,3] suggests that you're using images. You need to turn those to fully connected layers before concatenating. That's what I did to get images + DDPG "working" to some extent, in the sense that the code is running.

marysavari commented 5 years ago

Thank you very much for reply. I am using DDPG for gym CarRacing-v0 environment with CNN, total_timesteps=500000000, noise_type = 'ou_0.2' and it is using an image. I did not change any other thing from original code. In the model.py line 43 of DDPG, I got the above mentioned error. I appreciate your help.

marysavari commented 5 years ago

I appreciate it if you would reply my message. If you need more information please let me know. Thank you very much.

rkraghu88 commented 5 years ago

I too am getting a similar error with my custom environment! Here's my environment and the python code I'm using to execute my env+DDPG.

Environment: import gym from gym import error, spaces, utils from gym.utils import seeding import numpy as np import scipy.stats as stats from collections import deque

class CacheEnv(gym.Env): metadata = {'render.modes': ['human']}

def init(self): self.lib_size=1000 self.cache_size = 100 self.hit = 0 self.requests = 0 self.cache = deque(maxlen=self.cache_size) # Memory D for storing states, actions, rewards etc self.reward = 0 self.action=0 self.reward_window=200 self.rw_weight=500 self.inc=1 self.observation_space = spaces.Box(low=1,high=self.lib_size,shape=(1,self.cache_size+1)) self.action_space =spaces.Box(low=-self.cache_size,high=self.cache_size,shape=(1,)) for i in range(self.cache_size): a = np.random.randint(self.lib_size) while a in self.cache: a = np.random.randint(self.lib_size) self.cache.append(a+1) print("Initiated Cache:", self.cache) x = np.arange(1, self.lib_size) a = 1.1 weights = x ** (-a) weights /= weights.sum() bounded_zipf = stats.rv_discrete(name='bounded_zipf', values=(x, weights)) self.seq1 = bounded_zipf.rvs(size=10000) self.actions=np.arange(0,self.cache_size+1)

def step(self, action):

curr_state = np.array(self.cache)
#print(curr_state)
element=self.seq1[self.requests]
curr_state = np.append(curr_state, element)
#print("Appended:",curr_state)
#print(np.size(curr_state))
if element in self.cache:
    self.hit=np.append(self.hit,1)
    #self.action=self.cache_size
else:
    self.hit=np.append(self.hit,0)
self.action= np.argmin((self.actions-action)**2)
if self.action == self.cache_size:
    self.cache=self.cache
else:
    self.cache[self.action] = element
self.requests += 1
next_state=np.array(self.cache)
next_state.append(self.seq1(self.requests))
if np.size(self.hit) > self.reward_window:
    self.reward = np.sum(self.hit[(np.size(self.hit) - self.reward_window):np.size(self.hit)])
else:
    self.reward = np.sum(self.hit)
self.reward=(self.reward-self.hit[np.size(self.hit)-1])+self.rw_weight*self.hit[np.size(self.hit)-1]

return next_state, self.reward, 0, {}

def reset(self): self.cache=self.cache def render(self, mode='human'): if self.requests>0: print(np.sum(self.hit)/self.requests)

def close(self):

...

Test Code:

import gym import gym_cache import baselines.ddpg.ddpg as DDPGA env = gym.make('cache-v0')

DDPGA.learn('mlp',env)

marysavari commented 5 years ago

If you are getting error for concatenation of action and observation in critic class, at first turn observation to fully connected layers then concatenate it with action. Look at the following code. It will run but diverged. x = self.network_builder(obs) x = tf.concat([x, action], axis=-1)

rkraghu88 commented 5 years ago

Thank for your reply, but this gives the following shape error: Traceback (most recent call last): File "/home/ramkumar/PycharmProjects/Multicast_Queue_Python/Caches/Test_Cache_Gym.py", line 6, in DDPGA.learn('mlp',env) File "/home/ramkumar/PycharmProjects/Multicast_Queue_Python/baselines/ddpg/ddpg.py", line 93, in learn reward_scale=reward_scale) File "/home/ramkumar/PycharmProjects/Multicast_Queue_Python/baselines/ddpg/ddpg_learner.py", line 130, in init self.normalized_critic_tf = critic(normalized_obs0, self.actions) File "/home/ramkumar/PycharmProjects/Multicast_Queue_Python/baselines/ddpg/models.py", line 47, in call x = self.network_builder(x) File "/home/ramkumar/PycharmProjects/Multicast_Queue_Python/baselines/common/models.py", line 96, in network_fn h = fc(h, 'mlp_fc{}'.format(i), nh=num_hidden, init_scale=np.sqrt(2)) File "/home/ramkumar/PycharmProjects/Multicast_Queue_Python/baselines/a2c/utils.py", line 61, in fc w = tf.get_variable("w", [nin, nh], initializer=ortho_init(init_scale)) File "/home/ramkumar/PycharmProjects/Multicast_Queue_Python/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1496, in get_variable aggregation=aggregation) File "/home/ramkumar/PycharmProjects/Multicast_Queue_Python/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1239, in get_variable aggregation=aggregation) File "/home/ramkumar/PycharmProjects/Multicast_Queue_Python/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 562, in get_variable aggregation=aggregation) File "/home/ramkumar/PycharmProjects/Multicast_Queue_Python/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 514, in _true_getter aggregation=aggregation) File "/home/ramkumar/PycharmProjects/Multicast_Queue_Python/venv/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 869, in _get_single_variable (name, shape, found_var.get_shape())) ValueError: Trying to share variable critic/mlp_fc0/w, but specified shape (65, 64) and found shape (101, 64).