Open marysavari opened 5 years ago
What command did you run? Without knowing the details we can't help.
The shape of [?,96,96,3] suggests that you're using images. You need to turn those to fully connected layers before concatenating. That's what I did to get images + DDPG "working" to some extent, in the sense that the code is running.
Thank you very much for reply. I am using DDPG for gym CarRacing-v0 environment with CNN, total_timesteps=500000000, noise_type = 'ou_0.2' and it is using an image. I did not change any other thing from original code. In the model.py line 43 of DDPG, I got the above mentioned error. I appreciate your help.
I appreciate it if you would reply my message. If you need more information please let me know. Thank you very much.
I too am getting a similar error with my custom environment! Here's my environment and the python code I'm using to execute my env+DDPG.
Environment: import gym from gym import error, spaces, utils from gym.utils import seeding import numpy as np import scipy.stats as stats from collections import deque
class CacheEnv(gym.Env): metadata = {'render.modes': ['human']}
def init(self): self.lib_size=1000 self.cache_size = 100 self.hit = 0 self.requests = 0 self.cache = deque(maxlen=self.cache_size) # Memory D for storing states, actions, rewards etc self.reward = 0 self.action=0 self.reward_window=200 self.rw_weight=500 self.inc=1 self.observation_space = spaces.Box(low=1,high=self.lib_size,shape=(1,self.cache_size+1)) self.action_space =spaces.Box(low=-self.cache_size,high=self.cache_size,shape=(1,)) for i in range(self.cache_size): a = np.random.randint(self.lib_size) while a in self.cache: a = np.random.randint(self.lib_size) self.cache.append(a+1) print("Initiated Cache:", self.cache) x = np.arange(1, self.lib_size) a = 1.1 weights = x ** (-a) weights /= weights.sum() bounded_zipf = stats.rv_discrete(name='bounded_zipf', values=(x, weights)) self.seq1 = bounded_zipf.rvs(size=10000) self.actions=np.arange(0,self.cache_size+1)
def step(self, action):
curr_state = np.array(self.cache)
#print(curr_state)
element=self.seq1[self.requests]
curr_state = np.append(curr_state, element)
#print("Appended:",curr_state)
#print(np.size(curr_state))
if element in self.cache:
self.hit=np.append(self.hit,1)
#self.action=self.cache_size
else:
self.hit=np.append(self.hit,0)
self.action= np.argmin((self.actions-action)**2)
if self.action == self.cache_size:
self.cache=self.cache
else:
self.cache[self.action] = element
self.requests += 1
next_state=np.array(self.cache)
next_state.append(self.seq1(self.requests))
if np.size(self.hit) > self.reward_window:
self.reward = np.sum(self.hit[(np.size(self.hit) - self.reward_window):np.size(self.hit)])
else:
self.reward = np.sum(self.hit)
self.reward=(self.reward-self.hit[np.size(self.hit)-1])+self.rw_weight*self.hit[np.size(self.hit)-1]
return next_state, self.reward, 0, {}
def reset(self): self.cache=self.cache def render(self, mode='human'): if self.requests>0: print(np.sum(self.hit)/self.requests)
Test Code:
import gym import gym_cache import baselines.ddpg.ddpg as DDPGA env = gym.make('cache-v0')
DDPGA.learn('mlp',env)
If you are getting error for concatenation of action and observation in critic class, at first turn observation to fully connected layers then concatenate it with action. Look at the following code. It will run but diverged. x = self.network_builder(obs) x = tf.concat([x, action], axis=-1)
Thank for your reply, but this gives the following shape error:
Traceback (most recent call last):
File "/home/ramkumar/PycharmProjects/Multicast_Queue_Python/Caches/Test_Cache_Gym.py", line 6, in
Hi, I am getting error for concatenation of action and observation in the critic class from the following line: x = tf.concat([obs, action], axis=-1) # this assumes observation and action can be concatenated
The error is because they have different rank ValueError: Shape must be rank 4 but is rank 2 for 'critic/concat' (op: 'ConcatV2') with input shapes: [?,96,96,3], [?,3], [].
I tried to reshape action but I got error of cannot reshape array of size 1. My action dimension is (?, 3) and my observation dimension is (?, 96, 96, 3). Any suggestion?