Open danpovey opened 3 years ago
@danpovey
dense_fsas = k2fsa.dense_fsas(loglikes)
Are you implementing DenseFsaVec
? How is a DenseFsaVec
constructed from loglikes
?
Look in fsa.h. It is just a struct. Needs a constructor, that's all.
Note, something would need to be done in Python in general, when constructing it, to add the - infinitys in the right place. (This needs to be done at the Python level, using the base toolkit, so that autograd will work). There are 2 cases for construction: regular and irregular. In the irregular case, please note that Lhotse keeps track of the supervision start/end times separately from the features start/end times, so even if the features have all the same length the supervisions may not. We probably shouldn't interface directly with lhotse but should give it some reasonably usable interface. In general, each sequence (i.e. at the output of the nnet, a sequence of loglikelihoods) will be associated with zero or more supervision objects, possibly overlapping in time.
Look in fsa.h. It is just a struct. Needs a constructor, that's all.
DenseFsaVec
contains only two members: RaggedShape
and Array2<float>
.
Is it equivalent to the emssion graph
in the Figure 2(d)
of this paper?
lattices = k2fsa.pruned_compose(dense_fsas, decoding_graph)
Since dense_fsas
contains ragged shape instead of FSAs, do we construct the FSAs dynamically during
intersection?
Look in fsa.h. It is just a struct. Needs a constructor, that's all.
DenseFsaVec contains only two members: RaggedShape and Array2
. Is it equivalent to the emssion graph in the Figure 2(d) of this paper https://arxiv.org/pdf/2010.01003.pdf? lattices = k2fsa.pruned_compose(dense_fsas, decoding_graph)
Since dense_fsas contains ragged shape instead of FSAs, do we construct the FSAs dynamically during intersection?
Yes, comments there should explain it, but it contains the emission probabilities (as matrices) of a number of pieces of supervised audio that may not all be the same size. It also contains -1's at specific locations. The matrices are all appended into one Array2, and the ragged matrix says where each piece of audio starts and ends in the matrix.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/k2/issues/175#issuecomment-703666677, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO7WRTOSU445L4KQZNLSJHJLVANCNFSM4RW3HJFQ .
Guys, I am creating this issue just as a way to show this pseudocode, it demonstrates parts of where we are going with k2. One feature is that the fsa objects will have fields 'per_arc' which contain arbitrary tensors whose first dimension is the values.Dim() of the arcs. (They are accessed as if they were class members but are really members of a dict). When we do operations on FSAs, these
per_arc
quantities are propagated. (This is easy given the arc_map objects).The python-level fsa object is going to be a more complicated object than the C++ one. You can think of it as containing the C++ object as just one member (maybe we could make the C++ object a member called
arcs
or something).