katakombi / rnnlm

Recurrent Neural Network language modeling toolkit
Apache License 2.0
39 stars 19 forks source link

nbest output #1

Open ghost opened 9 years ago

ghost commented 9 years ago

Hi,

I downloaded the software from http://rnnlm.org/, so I am not sure if this is a good place to ask questions/report issues. I apologize if it is not.

I am trainig the sample model, and then I am trying to find the nbest using the following input:

$ cat best.txt

1 I AM
1 I AM
1 I AM
2 I AM
2 I AM
3 I AM
3 I AM
3 I AM
3 I AM

And I am getting the following output with the 0.4b version:

./rnnlm -rnnlm model -test best.txt -nbest

-3.888084
-3.529828
-3.529828
-3.558858
-3.558858
-3.561705
-3.561705
-3.561705
-3.561705

Whereas with the 0.2b version I get:

./rnnlm -rnnlm model -test best.txt -nbest

-3.912723
-3.912723
-3.912723
-3.158704
-3.158704
-3.160540
-3.160540
-3.160540
-3.160540

My questions are:

1- Shouldn't I get the same probabiliy for all the sentences, given that they are all equal? 2- Why is the 0.4b version returning a probability for the first sentence that is different than the probabilities of the other sentences that start with a "1"?

Thanks a lot!

katakombi commented 9 years ago

Hi melonista,

thanks for using rnnlm ;) - this is the right place to ask.

The answers to your questions: 1- No. The code between 0.2b and 0.4b differs quite a bit. Especially since 0.4b is using another approximation for the exponential function probabiliy wont be directly comparable. However, if you see that these are log-likelihoods, they are all more or less similar. 2- Good question. Can be a bug with initialization or treatment of the -independent parameter. Havent used this code path in a while. Could you rerun the nbest command with the -independent switch and post the output?

thanks

ghost commented 9 years ago

Hi!

Thanks a lot for your response!

I retrained the model using 0.4b with "-independent" option and now the issue is fixed.

However, I would like to re-phrase my 1st question: I understand that, given the same input, there could be differences between the outputs of version 0.2b and 0.4b. However, within the same model, if I input the same sentence three times:

$ cat best.txt

1 I AM
1 I AM
1 I AM
2 I AM
2 I AM
2 I AM
3 I AM
3 I AM
3 I AM
3 I AM

I get:

./rnnlm -rnnlm model -test best.txt -nbest

-3.912723
-3.912723
-3.912723
-3.158704
-3.158704
-3.158704
-3.160540
-3.160540
-3.160540
-3.160540

Shouldn't the probability of finding this sentence always be the same? (Shouldn't I get the same number repeated 10 times?)

Also, I think I do not correctly understand which is the purpose of starting each group of sentences with one common token:

$ cat best.txt

1 I AM AT ROME
1 I AM AT HOME
2 THIS IS RED
2 THIS IS SAD

I understand you group together all the candidates for the same "true sentence", but isn't going the model to calculate the probability of each sentence in an independent way? Why is it needed to group them?

Finally, I have a very short vocabulary and I am trying to predict which will be the next word on each sentence. My approach has been to put every test sentence with all the possible last words on a file, and run the test with the -nbest option:

$ cat best.txt

1 FIRST SENTENCE POSSIBLE-LAST-WORD-1
1 FIRST SENTENCE POSSIBLE-LAST-WORD-2
1 FIRST SENTENCE POSSIBLE-LAST-WORD-3
1 FIRST SENTENCE POSSIBLE-LAST-WORD-4
1 FIRST SENTENCE POSSIBLE-LAST-WORD-5
....

2000 LAST SENTENCE POSSIBLE-LAST-WORD-1
2000 LAST SENTENCE POSSIBLE-LAST-WORD-2
2000 LAST SENTENCE POSSIBLE-LAST-WORD-3
2000 LAST SENTENCE POSSIBLE-LAST-WORD-4
2000 LAST SENTENCE POSSIBLE-LAST-WORD-5

That way I get the log probability of each sentence, and therefore I know which is the most probable last word. Is there any better way of performing this task with your toolkit?

Thanks a lot!

katakombi commented 9 years ago

Dear melonista,

first let me tell you Im sorry for coming back to you so late. I think there are a couple of misunderstandings which I will try to clarify.

First, regardless of whether you trained your model using the -independent flag, you can still think of performing nbest using -independent or dependently.

Actually, nbest text files represent a bunch of sentence hypotheses grouped together. Since these come as an output of an ASR system, it is very likely they start using the same words, or similar sounding words.

When nbest are processed the RNN can be reinitialized with a standard context after every bunch of hypotheses, or it can be used depending on previously seen context, which seems to be the case when you observe different likelihoods after multiple sequential processings of the same sentence.

The best way (in terms of performance) to guess the most likely preceeding word is IMO to write an own function based on the nbest or random sampling function which will compute the entire distribution and decide for the most likely output. Note that due to the class-factorization only a small part of the entire distribution gets computed i.e. the part which corresponds to the class a predicted word is associated with. If speed, however, is irrelevant to you, you can simply run as you suggested :)