Open rizar opened 9 years ago
Unfortunately I don't see you point. Everything is Markovian. So the probability f the next state/emission is always dependent on the current state/input.
The fact that in some models there are states which disallow certain symbols does not violate the Markovian assumption. And techincally, you can always assume that they do allow all outputs, but some have an infinitesimally small probability.
I am not saying that P(y_t|s_1, ...., s_t) is not P(y_t|s_t). What I am saying is, that P(y_t|s_t) is sometimes not available.
An example: suppose we are combining P(W|X) computed by a neural net with Q(W) by a language model in a multiplicative way: COST(W,X) = P(W|X)Q(W). In such cases we typically minimize log(COST) = log P(W_1|X) + log P(W_2| W_1 X) + ... + log Q(W_1) + log Q(W_2|W_1) + .... This way additive scores are defined for each character W_i, but the resulting probability of W_i given W1, ..., W{i-1} under the joint model is in fact intractable! (would require normalization over all W to compute).
On the other hand, the current code and documentation of SequenceGenerator
assume that this conditional probability is always available and one can always sample from the distribution defined by the SequenceGenerator
. What I propose here is that in its most generic form SequenceGenerator
should be just a formula for computing COST(W, X), without assuming that this cost is always a log-likelihood. generate
method should become optional, emit
method for Emitter
interface should become optional.
In the current version of sequence generator framework it is assumed that it is always possible to emit a next token given the contexts and the previous. The
readout.emit
method is supposed to return the respective computation graph.The truth is that it is not always possible. For some sequence generation methods only a cost of generating the whole sequence can be defined. This is what we hit in speech recognition, in which the cost of transcript is defined as
log P(W|X) + beta * log Q(W)
for whole sequence, making all probabilities intractable.However, most of
sequence_generators.py
andsearch.py
could be reused in such cases. This tickets stands for a revision ofSequenceGenerator
interface that would make generative semantics optional.@janchorowski, this is a major ticket on our way of using purely Blocks master in fully-neural-lvsr.