How to understand the "get_sent_log_prob" function in scorer.cpp?

Say there is the sentence: [w1, w2, w3, w4, w5]. And we already have a 3-gram language model. After load the language model, according to the source code, we shall get the probability of the sentence as:

P(sentence) = P(<s>, w1, w2) * P(w1, w2, w3) * P(w2, w3, w4) * P(w3, w4, w5) * P(w4, w5, </s>)

while,

P(<s>, w1, w2) = P(w1|<s>) * P(w2|w1, <s>)
P(w1, w2, w3) = P(w2|w1) * P(w3|w1, w2)
......
P(w4, w5, </s>) = P(w5|w4) * P(</s>|w4, w5)

I can understand that P(w1, w2, w3) above which is got from bayesian chain rule. But why should P(sentence) be calculated this way? Is there any paper I can find to support this formula?

Thank you!

parlance / ctcdecode

How to understand the "get_sent_log_prob" function in scorer.cpp? #174