Open wezardlza opened 3 years ago
@wezardlza I have the same issue.
You need to reshape the old_logvars value after 'old_means, old_logvars = self.policy(observes)' line.
You can do it by adding the below line.
old_logvars = K.tile(old_logvars, (observes.shape[0], 1))
I can see the mean reward is increased to 1000. (:
Hi, thanks very much for your work. I use docker to build an environment to learn your work. When I use
FROM tensorflow/tensorflow:2.3.3-gpu-jupyter
to create a container, and test the examplesall the tests passed. But when I use newer images, for instance,
FROM tensorflow/tensorflow:2.4.2-gpu-jupyter
, I got theValueError: Data cardinality is ambiguous
error as presented below.After some checks, I found in file
./trpo/policy.py
the below code caused the mismatched batch sizewhich set the first dimension of
logvars
to be one during runtime constantly while the first dimension ofinputs
seems varied. Thus, based on the code above, the first dimension ofmeans
is also different fromlogvars
which causes the errorThus, I do the following things: In file
./trpo/policy.py
, addand change
logvars = K.tile(logvars, (self.batch_sz, 1))
tologvars = K.tile(logvars, (shape(inputs)[0], 1))
. These helped me to pass the exmplebut it seems
self.batch_sz
will not be used anymore. Perhaps we can just changelogvars = K.tile(logvars, (self.batch_sz, 1))
tologvars = K.tile(logvars, (shape(inputs)[0], 1))
and remove thebuild()
method above? I am new to TensorFlow and would like to know whether my changes will cause any problems or even errors for the TRPO results. Thanks for help!