Closed bmilde closed 7 years ago
Hi, thanks for reporting. You're using it correctly, but there's a change I made, regarding how the target class IDs (the classes that will be predicted) will be passed to the output layer, that apparently causes sampling not to work with hierarchical softmax. The problem is that hierarchical softmax doesn't produce the full output distribution, unless required (for sampling). I don't know why rescoring and decoding doesn't work. When doing rescoring and decoding, only target word probabilities are needed, and I'm currently using TheanoLM to decode lattices, also with hierarchical softmax. But let me fix this bug first.
TheanoLM doesn't think that you're using word classes - If you don't use word classes, each word will still be put in its own class.
The previous commit seems to fix the problem with sampling. Can you pull the latest version from Git and test if it works for you? Please reopen the issue if you still have problems with rescoring or decoding.
Oh wow, thanks for your fast response! I can confirm that sampling works now with the latest version. I will try lattice decoding later today.
Still having trouble to decode from a lattice:
theanolm decode --lattices exp/2xnew_train_newphn_i180_l4_c300_RMSProp_learn0.001_clip5.0_dropout0.0_unitfix_adddeltafalse/decode_KA3_test2/lat.1.gz --beam 10 data/lm/nn/nn_final3_gpu.h5
Reading vocabulary from network state.
Number of words in vocabulary: 419231
Number of word classes: 419231
Building neural network.
Restoring neural network state.
Building word lattice decoder.
An unexpected AttributeError exception occurred. Traceback will be written to debug log. The error message was: 'NoneType' object has no attribute 'readlines'
With a higher debug level I get this:
2017-01-12 14:28:26,083 decode: DECODING OPTIONS
2017-01-12 14:28:26,083 decode: beam: 10.0
2017-01-12 14:28:26,083 decode: ignore_unk: False
2017-01-12 14:28:26,083 decode: wi_penalty: None
2017-01-12 14:28:26,083 decode: max_tokens_per_node: None
2017-01-12 14:28:26,083 decode: unk_penalty: None
2017-01-12 14:28:26,083 decode: linear_interpolation: False
2017-01-12 14:28:26,083 decode: nnlm_weight: 1.0
2017-01-12 14:28:26,083 decode: recombination_order: None
2017-01-12 14:28:26,083 decode: lm_scale: None
Building word lattice decoder.
An unexpected AttributeError exception occurred. Traceback will be written to debug log. The error message was: 'NoneType' object has no attribute 'readlines'
2017-01-12 14:28:27,759 exception_handler: Traceback:
2017-01-12 14:28:27,759 exception_handler: File "/home/bmilde/.local/bin/theanolm", line 85, in
The solution is rather simple, above line 185 one just needs to add a check if lattice_list is not None:
if args.lattice_list is not None:
I'm still having problem with decoding, but that is rather because of a misunderstanding on my part. I thought that theanolm needs Kaldis lattice format, instead it expects an SLF format lattice.
I will try to use this script to convert my lattices: https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/utils/convert_slf.pl
Thanks. I always use --lattice-list
, so I haven't noticed that bug. I'll add the fix.
Yes, I convert lattices first to SLF. (I use convert_slf_parallel.sh
in the same directory.)
Thanks for creating theanolm! I tried to use it for rescoring in an ASR system, but I ran into the following problems/errors:
I've trained with:
theanolm train nn_final3_gpu.h5 --training-set final3.train --validation-file final3.val --architecture lstm400 --learning-rate 0.01 --optimization-method adam --vocabulary-format words --vocabulary words_man_vocab.txt
So using the predefined vocabulary of the ASR model and using a network "lstm400", which is almost identical to the example h_softmax network in the docs, just smaller:
I can train the network and I see decreasing perplexities.
When I try to sample from it, I get the following error message:
"Trying to read output distribution, while the output layer has produced only target class probabilities." So somehow theanolm thinks that I'm using word classes but I don't? _self.output_layer.outputprobs is None and I can see that it is being set that way when _target_classids is not None in network/hsoftmaxlayer.py. But I don't use word classes. Also, I can see that no. of word classes is the same as no. of words. Maybe this is related to the problem?
Rescoring/decoding also expects output_probs to be defined and doesn't work because it is None.
Is this the correct way of training/using a network with h_softmax? Or is this a bug in theanolm?