microsoft / ailab

Experience, Learn and Code the latest breakthrough innovations with Microsoft AI
https://www.ailab.microsoft.com/experiments/
MIT License
7.7k stars 1.39k forks source link

Pix2Story: Model won't build - type mismatch when building optimizers. #83

Closed TomReidNZ closed 5 years ago

TomReidNZ commented 5 years ago

On a new clone of your repo, I can't get the model to train. There's a type mismatch in the updates when building the optimizer.

Running on conda on macOS (using CPU). I didn't mess with any files, just added in the .txt file. I tried updating the n_words, changing various things in the config file but no luck.

Any help would be much appreciated. Thanks, Tom

Error message:

Building optimizers... Traceback (most recent call last): File "/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/compile/pfunc.py", line 193, in rebuild_collect_shared allow_convert=False) File "/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/tensor/type.py", line 234, in filter_variable self=self)) TypeError: Cannot convert Type TensorType(float64, matrix) (of Variable Elemwise{add,no_inplace}.0) into Type TensorType(float32, matrix). You can try to manually convert Elemwise{add,no_inplace}.0 into a TensorType(float32, matrix).

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "training.py", line 6, in EncTrainer.train() File "/Users/tom/Documents/development/ailab/Pix2Story/source/training/train_encoder.py", line 40, in train trainer(self.text, self.training_options) File "/Users/tom/Documents/development/ailab/Pix2Story/source/skipthoughts_vectors/training/train.py", line 128, in trainer f_grad_shared, f_update = eval(optimizer)(lr, tparams, grads, inps, cost) File "/Users/tom/Documents/development/ailab/Pix2Story/source/skipthoughts_vectors/encdec_functs/optim.py", line 40, in adam f_update = theano.function([lr], [], updates=updates, on_unused_input='ignore', profile=False) File "/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/compile/function.py", line 317, in function output_keys=output_keys) File "/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/compile/pfunc.py", line 449, in pfunc no_default_updates=no_default_updates) File "/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/compile/pfunc.py", line 208, in rebuild_collect_shared raise TypeError(err_msg, err_sug) TypeError: ('An update must have the same type as the original shared variable (shared_var=<TensorType(float32, matrix)>, shared_var.type=TensorType(float32, matrix), update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64, matrix)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')

ericmcmc commented 5 years ago

Hello ,

I think it could be caused by theano config. Seeing the error you are getting, the problem could be the parameter floatX in theano's config is set as float64 when it should be float32.

To check theano's config you can do:

import theano

print(theano.config)

To set a new config for an execution you can do it like this:

THEANO_FLAGS='floatX=float32' python training.py

More information here: http://deeplearning.net/software/theano/library/config.html

If the problem remains the same, could you send a more detailed info about theano's config and the sentences you are passing to the net?

Regards

ericmcmc commented 5 years ago

Hello, Did my answer solve your issue? Thanks!

mrcabellom commented 5 years ago

Can we close this issue? @TomReidNZ did you solve the problem?

TomReidNZ commented 5 years ago

Now I get this error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "training.py", line 6, in <module>
    EncTrainer.train()
  File "/home/zarmada/pix2story/Lab/source/training/train_encoder.py", line 40, n train
    trainer(self.text, self.training_options)
  File "/home/zarmada/pix2story/Lab/source/skipthoughts_vectors/training/train.py", line 150, in trainer
    cost = f_grad_shared(x, x_mask, y, y_mask, z, z_mask)
  File "/home/zarmada/anaconda3/envs/storytelling/lib/python3.5/site-packages/th                                                                                                                               eano/compile/function_module.py", line 903, in __call__
    self.fn() if output_subset is None else\
  File "/home/zarmada/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/gof/vm.py", line 305, in __call__
    link.raise_with_op(node, thunk)
  File "/home/zarmada/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/gof/link.py", line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/home/zarmada/anaconda3/envs/storytelling/lib/python3.5/site-packages/six.py", line 692, in reraise
    raise value.with_traceback(tb)
  File "/home/zarmada/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/gof/vm.py", line 301, in __call__
    thunk()
  File "/home/zarmada/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/gof/op.py", line 892, in rval
    r = p(n, [x[0] for x in i], o)
  File "/home/zarmada/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/tensor/elemwise.py", line 790, in perform
    variables = ufunc(*ufunc_args, **ufunc_kwargs)
  File "/home/zarmada/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/scalar/basic.py", line 4023, in impl
    output_storage = [[None] for i in xrange(self.nout)]
SystemError: <class 'range'> returned a result with an error set
Apply node that caused the error: Elemwise{Composite{Switch(i0, ((i1 * i2) / i3), i2)}}[(0, 2)](InplaceDimShuffle{x,x}.0, TensorConstant{(1, 1) of 5.0}, Elemwis                                                                                                                               e{Add}[(0, 1)].0, InplaceDimShuffle{x,x}.0)
Toposort index: 741
Inputs types: [TensorType(bool, (True, True)), TensorType(float32, (True, True)), TensorType(float32, matrix), TensorType(float32, (True, True))]
Inputs shapes: [(1, 1), (1, 1), (4800, 20000), (1, 1)]
Inputs strides: [(1, 1), (4, 4), (80000, 4), (4, 4)]
Inputs values: [array([[ True]], dtype=bool), array([[ 5.]], dtype=float32), 'not shown', array([[ 44.09825897]], dtype=float32)]
Outputs clients: [['output']]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano f                                                                                                                               lag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
ericmcmc commented 5 years ago

Hello @TomReidNZ ,

Could you post the whole output you are getting when you execute training.py along some samples of the list of sentences you are passing to the net?

Regards

gsegares commented 5 years ago

Hi @ericmcmc, this is the output of the error. We created this text model using these books for the test.

ericmcmc commented 5 years ago

Hi @TomReidNZ and @gsegares,

Seeing the output you get it could be caused by the Theano's version you are using. Could you try updating Theano to 1.0.3 version?

Please, let me know if using the original code with Theano 1.0.3 you can train the models.

Regards

gsegares commented 5 years ago

Hi @ericmcmc, we changed the NC6 VM that we were using in Azure to use a deep learning template with all the GPU packages included and we are not getting that error anymore. We also changed the conda file to use the specific version of theano. We are having issues with some missing components in the repo (like paths['v_expansion'] = '../models/GoogleNews-vectors-negative300.bin') but that's a different problem. Your suggestion regarding the theano flag to fix the data type mismatch worked. I think we can close this issue. Thanks.