lmjohns3 / theanets

Neural network toolkit for Python
http://theanets.rtfd.org
MIT License
328 stars 73 forks source link

Inconsistency in building network structure leads to failure when loading saved model #77

Closed qyouurcs closed 9 years ago

qyouurcs commented 9 years ago

Hello all,

I tried to load the trained model (Bidirectional LSTM). Using the same layer structure, I received the following errors.

local/lib/python2.7/site-packages/theanets-0.5.3-py2.7.egg/theanets/graph.py", line 492, in load_params for p, v in zip(layer.params, saved['{}-values'.format(layer.name)]): KeyError: 'bdrnn1-values'

1. def layer(n):
2.    return dict(form='bidirectional', worker='lstm', size=n)
3. e = theanets.Experiment(
4.     theanets.recurrent.Regressor,
5.     layers=layers(102, layer(50), 1)
6.   )
7. e = e.load(model_fn)

The problem is that we have built a LSTM model, however, the key for the parameters is "bdrnn".

We can delve into the construction of the network as follows.

1) We set debug before line 3 in the above code. In the theanets.Experiment, it will try to construct a network model. we can check the structure of the model.

(Pdb) self.network.layers[1].params
[bdlstm1_fw_xh, bdlstm1_fw_hh, bdlstm1_fw_b, bdlstm1_fw_ci, bdlstm1_fw_cf, bdlstm1_fw_co, bdlstm1_bw_xh, bdlstm1_bw_hh, bdlstm1_bw_b, bdlstm1_bw_ci, bdlstm1_bw_cf, bdlstm1_bw_co]
(Pdb) self.network.layers[1].name
'bdlstm1'

This is consistent with what we are trying to do.

2) We set debug before line 7.

In the load function, the lib is also trying to build a network model. In particular,

> /u/qyou/.local/lib/python2.7/site-packages/theanets-0.5.3-py2.7.egg/theanets/graph.py(99)load()
-> net = pkl['klass'](**kw)

In line 99 at graph.py, it tries to build the structure of the model. In particular, we can check the process when constructing the 2nd layer.

> /u/qyou/.local/lib/python2.7/site-packages/theanets-0.5.3-py2.7.egg/theanets/graph.py(144)__init__()
-> self.add_layer(layer, is_output=i == len(layers) - 1)
(Pdb) layer
{'inputs': {'in.out': array(102)}, 'activation': 'logistic', 'name': 'bdlstm1', 'form': 'bidirectional', 'size': 50}

And, we continue to go on the debugging at graph.py:add_layer() function.

-> if isinstance(form, str) and form.lower() == 'bidirectional':
(Pdb) l
225             if isinstance(layer, dict):
226                 if 'form' in layer:
227                     form = layer.pop('form').lower()
228                 kwargs.update(layer)
229  
230  ->         if isinstance(form, str) and form.lower() == 'bidirectional':
231                 kwargs['name'] = 'bd{}{}'.format(
232                     kwargs.get('worker', 'rnn'), len(self.layers))
233  
234             if isinstance(form, str) and form.lower() == 'tied':
235                 partner = kwargs.get('partner')

Here, in line 230

(Pdb) kwargs
{'inputs': {'in.out': array(102)}, 'name': 'bdlstm1', 'activation': 'logistic', 'size': 50}

Thus, we do not have "worker" field in the dict, which will leads the code to set name to "bdrnn1" in line 231-232.

(Pdb) kwargs
{'inputs': {'in.out': array(102)}, 'name': 'bdrnn1', 'activation': 'logistic', 'size': 50}

This is going to lead the error when loading the bidirectional lstm model.

To solve this problem, one can update the add_layer function.

Or we can change the save, which can save the "worker" field into the model as well.

Let me know if I have made any mistakes.

Thanks.

lmjohns3 commented 9 years ago

This is definitely a problem!

I have just made a change that drops much of the loading and saving code that I wrote, in favor of just using Python's built-in pickling mechanism more directly.

If you have a chance to check out the github master and can check on it, I'd appreciate hearing whether this fixes the issue!

lmjohns3 commented 9 years ago

I'm going to go ahead and close this -- feel free to reopen if it's still a problem.