Closed shakingWaves closed 5 years ago
I believe OpenFST hard-codes StateId to be an int32 in its code base. Why are you trying to change that type?
I wouldn't object to those things being changed where you find such errors, since the change would be harmless with OpenFST as it is.
On Thu, Mar 28, 2019 at 10:34 AM Daniel Galvez notifications@github.com wrote:
I believe OpenFST hard-codes StateId to be an int32 in its code base. Why are you trying to change that type?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3177#issuecomment-477621390, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu2fIML3BTspA_Lx-HIMfM53rsdbjks5vbNMLgaJpZM4cPdmn .
I think it is good practice to use StateId instead of int32 if you really meant int32, as this makes the code more generic.
But OpenFST, if you look at fst/arc.h, seems to have using StateId = int;
(int is 32-bits, by the way) for all of its arc types. It seems to me that you would need to use an entirely new arc type if you wanted to make the kinds of changes that you are proposing.
@galv @danpovey I intend to use very huge G.fst to produce HCLG,But it will produce many WFST state, So, the state number will exceed the upper limit of int32. replacing int32 with int64 of StateId in arc.h will avoid such problem.
That's generally not a good idea. It will be super slow to decode. Better
to do lattice rescoring or biglm
decoding (although biglm decoding is
slow).
On Thu, Mar 28, 2019 at 10:17 PM shakingWaves notifications@github.com wrote:
@galv https://github.com/galv @danpovey https://github.com/danpovey I intend to use very huge G.fst to produce HCLG,But it will produce many WFST state, So, the state number will exceed the upper limit of int32. replacing int32 with int64 of StateId in arc.h will avoid such problem.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3177#issuecomment-477839765, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu61Wsv1T0uZdp6gOkMFF--I6HIfBks5vbXengaJpZM4cPdmn .
the state number will exceed the upper limit of int32.
A meager FST with just 4 billion states (the smallest warranting the extension) and a single arc per state would use up 128GB of memory only for the data it contains, if I drop it all into a big bag of integers and floats and shake it well; this is not counting memory for the data structures to make sense of this pile of bits. Think 200-300GB of RAM realistically. I'd wait till the year 2050 or so before attempting the beam search in such a graph, if I were you, provided the Moore's law holds.
@danpovey the normal case: StateId = int32, when change StateId frome int32 to int64, some errors will occur. such as function DiscriminativeSupervisionSplitter::PrepareLattice. if replace int32 with StateId if function DiscriminativeSupervisionSplitter::PrepareLattice, it will be ok. But, Similar problems exist in other functions.