senarvi / theanolm

TheanoLM is a recurrent neural network language modeling tool implemented using Theano
Apache License 2.0
81 stars 29 forks source link

Neither sampling nor rescoring works with a hsoftmax network (runtime errors relating to NoneType) #21

Closed bmilde closed 7 years ago

bmilde commented 7 years ago

Thanks for creating theanolm! I tried to use it for rescoring in an ASR system, but I ran into the following problems/errors:

I've trained with:

theanolm train nn_final3_gpu.h5 --training-set final3.train --validation-file final3.val --architecture lstm400 --learning-rate 0.01 --optimization-method adam --vocabulary-format words --vocabulary words_man_vocab.txt

So using the predefined vocabulary of the ASR model and using a network "lstm400", which is almost identical to the example h_softmax network in the docs, just smaller:

input type=word name=word_input
layer type=projection name=projection_layer input=word_input size=100
layer type=dropout name=dropout_layer_1 input=projection_layer dropout_rate=0.2
layer type=lstm name=hidden_layer_1 input=dropout_layer_1 size=400
layer type=dropout name=dropout_layer_2 input=hidden_layer_1 dropout_rate=0.2
layer type=tanh name=hidden_layer_2 input=dropout_layer_2 size=400
layer type=dropout name=dropout_layer_3 input=hidden_layer_2 dropout_rate=0.2
layer type=hsoftmax name=output_layer input=dropout_layer_3

I can train the network and I see decreasing perplexities.

When I try to sample from it, I get the following error message:

theanolm sample --num-sentences 10 nn_final3_gpu.h5
Using gpu device 0: GeForce GTX TITAN Black (CNMeM is enabled with initial size: 80.0% of memory, cuDNN Version is too old. Update to v5, was 4004.)
Reading vocabulary from network state.
Number of words in vocabulary: 419231
Number of word classes: 419231
Building neural network.
Restoring neural network state.
Building text sampler.
Un unexpected RuntimeError exception occurred. Traceback will be written to debug log. The error message was: Trying to read output distribution, while the output layer has produced only target class probabilities.

"Trying to read output distribution, while the output layer has produced only target class probabilities." So somehow theanolm thinks that I'm using word classes but I don't? _self.output_layer.outputprobs is None and I can see that it is being set that way when _target_classids is not None in network/hsoftmaxlayer.py. But I don't use word classes. Also, I can see that no. of word classes is the same as no. of words. Maybe this is related to the problem?

Rescoring/decoding also expects output_probs to be defined and doesn't work because it is None.

Is this the correct way of training/using a network with h_softmax? Or is this a bug in theanolm?

senarvi commented 7 years ago

Hi, thanks for reporting. You're using it correctly, but there's a change I made, regarding how the target class IDs (the classes that will be predicted) will be passed to the output layer, that apparently causes sampling not to work with hierarchical softmax. The problem is that hierarchical softmax doesn't produce the full output distribution, unless required (for sampling). I don't know why rescoring and decoding doesn't work. When doing rescoring and decoding, only target word probabilities are needed, and I'm currently using TheanoLM to decode lattices, also with hierarchical softmax. But let me fix this bug first.

TheanoLM doesn't think that you're using word classes - If you don't use word classes, each word will still be put in its own class.

senarvi commented 7 years ago

The previous commit seems to fix the problem with sampling. Can you pull the latest version from Git and test if it works for you? Please reopen the issue if you still have problems with rescoring or decoding.

bmilde commented 7 years ago

Oh wow, thanks for your fast response! I can confirm that sampling works now with the latest version. I will try lattice decoding later today.

bmilde commented 7 years ago

Still having trouble to decode from a lattice:

theanolm decode --lattices exp/2xnew_train_newphn_i180_l4_c300_RMSProp_learn0.001_clip5.0_dropout0.0_unitfix_adddeltafalse/decode_KA3_test2/lat.1.gz --beam 10 data/lm/nn/nn_final3_gpu.h5

Reading vocabulary from network state.
Number of words in vocabulary: 419231
Number of word classes: 419231
Building neural network.
Restoring neural network state.
Building word lattice decoder.
An unexpected AttributeError exception occurred. Traceback will be written to debug log. The error message was: 'NoneType' object has no attribute 'readlines'
bmilde commented 7 years ago

With a higher debug level I get this:

2017-01-12 14:28:26,083 decode: DECODING OPTIONS 2017-01-12 14:28:26,083 decode: beam: 10.0 2017-01-12 14:28:26,083 decode: ignore_unk: False 2017-01-12 14:28:26,083 decode: wi_penalty: None 2017-01-12 14:28:26,083 decode: max_tokens_per_node: None 2017-01-12 14:28:26,083 decode: unk_penalty: None 2017-01-12 14:28:26,083 decode: linear_interpolation: False 2017-01-12 14:28:26,083 decode: nnlm_weight: 1.0 2017-01-12 14:28:26,083 decode: recombination_order: None 2017-01-12 14:28:26,083 decode: lm_scale: None Building word lattice decoder. An unexpected AttributeError exception occurred. Traceback will be written to debug log. The error message was: 'NoneType' object has no attribute 'readlines' 2017-01-12 14:28:27,759 exception_handler: Traceback: 2017-01-12 14:28:27,759 exception_handler: File "/home/bmilde/.local/bin/theanolm", line 85, in main() 2017-01-12 14:28:27,759 exception_handler: File "/home/bmilde/.local/bin/theanolm", line 57, in main args.command_function(args) 2017-01-12 14:28:27,759 exception_handler: File "/home/bmilde/.local/lib/python3.4/site-packages/theanolm/commands/decode.py", line 185, in decode lattices.extend(args.lattice_list.readlines())

The solution is rather simple, above line 185 one just needs to add a check if lattice_list is not None:

if args.lattice_list is not None:

I'm still having problem with decoding, but that is rather because of a misunderstanding on my part. I thought that theanolm needs Kaldis lattice format, instead it expects an SLF format lattice.

I will try to use this script to convert my lattices: https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/utils/convert_slf.pl

senarvi commented 7 years ago

Thanks. I always use --lattice-list, so I haven't noticed that bug. I'll add the fix.

Yes, I convert lattices first to SLF. (I use convert_slf_parallel.sh in the same directory.)