neocxi / pixelsnail-public

MIT License
124 stars 23 forks source link

Hanging after make_template #4

Open joshim5 opened 6 years ago

joshim5 commented 6 years ago

I am trying to train a vanilla CIFAR-10 Pixel-SNAIL model using the command given in the README.

In Tensorflow 1.9, the code is hanging at these lines:

model = tf.make_template('model', getattr(pxpp_models, args.model + "_spec"))
with tf.device('/gpu:0'):
gen_par = model(x_init, h_init, init=True,
dropout_p=args.dropout_p, **model_opt)

Is this a known issue that can be resolved? Which version of Tensorflow has been tested?

wlin12 commented 5 years ago

Tried tensorflow 1.2.1 and works fine. Its a tensorflow issue. See here https://github.com/CuriousAI/mean-teacher/issues/1

wrrogers commented 4 years ago

I cannot get it to run with version 1.2.1. I received the error:

ModuleNotFoundError: No module named 'tensorflow.contrib'

I guess it's from the import:

from tensorflow.contrib.framework.python.ops import add_arg_scope

which gets used as a decorator on a bunch of functions

I'm not sure how to overcome this.

TWJubb commented 4 years ago

I've also run into this problem and tried almost everything suggested.

I have Ubuntu 18, which doesn't support CUDA version 8, which is needed for tensorflow 1.2.1. So swapping to older versions of Tensorflow doesn't work for me.

Does anyone have any sort of way to fix the issue with code? Otherwise I think this code will just become unusable in the future.

TWJubb commented 4 years ago

I've also run into this problem and tried almost everything suggested.

I have Ubuntu 18, which doesn't support CUDA version 8, which is needed for tensorflow 1.2.1. So swapping to older versions of Tensorflow doesn't work for me.

Does anyone have any sort of way to fix the issue with code? Otherwise I think this code will just become unusable in the future.

I tried substituting the dense and conv2d functions from nn.py with those from the pixelCNN++ code

https://github.com/openai/pixel-cnn/blob/master/pixel_cnn_pp/nn.py#L160

This seems to have worked but I have no idea why as the two sets of functions are very similar.

I am using TensorFlow 1.15.2