Closed alex-lew closed 1 month ago
This could be fixed, or an input to the distribution's constructor, as k is in Emily's implementation of Dirichlet-Categorical
I like the latter option---would give us more flexibility to bridge between ascii, unicode, token vocabularies etc
$i=V+1$ as a special 'start symbol' and $j=V+1$ as a special 'end symbol'.
maybe a typo and you mean i=0, or i=V?
Actually, I think it should be possible to implement this internally in terms of a vector of Dirichlet-Categorical distributions.
sounds good to me!
maybe a typo and you mean i=0, or i=V?
Ah, I was thinking with 1-indexed subscripts. So $i=1, \dots, V$ for the actual vocabulary, and $i=V+1$ for the special symbol. But maybe we should 0-index to keep closer to the code.
Once we have addressed #3, we will want to add a simple distribution over string-valued data.
I suggest the following setup:
The state we track would be a matrix of observed transition counts -- how often did we transition from letter i to letter j, for each letter?
Actually, I think it should be possible to implement this internally in terms of a vector of Dirichlet-Categorical distributions.