k2-fsa / k2

FSA/FST algorithms, differentiable, with PyTorch compatibility.
https://k2-fsa.github.io/k2
Apache License 2.0
1.1k stars 214 forks source link

Compile LG: When arpa is large, an error occurs #1233

Closed jingyu1220 closed 1 year ago

jingyu1220 commented 1 year ago

the error is : Check failed: state_index_c + 2 == arcindexes.size() (-1909392433 vs. 2385574863)

csukuangfj commented 1 year ago

How large is your LG?

If it is too large, we suggest that you prune your G before composing it with L.

jingyu1220 commented 1 year ago

The G.fst.txt is 68G. As we want to use a large G to get better performance.

jingyu1220 commented 1 year ago

This error is like an array out of bounds problem

danpovey commented 1 year ago

Likely the issue is that after composing with L, some of the indexes go outside the range of int32. I believe we may have some examples of using OpenFST instead of k2 to compile decoding graphs, as it has slightly larger maximum limits (although there is also an int32 limitation, just with some differences of implementation).

danpovey commented 1 year ago

Wait, 68G?? You sure you don't mean 68M? 68G is way way out of bounds. Normally when people want a good LM they rely on neural language models.

zhangzhengyireal commented 1 year ago

I encountered a similar issue. The size of my LM is 3.7G, my dictionary size is 250000 and I have 4000 syllable tokens. When I cut the LM to 0.7G, it works. So is there any way to fix the error when the LM size is large? k2-version: 1.24.3 Git SHA1: 460c841d0329273e4d8ee60e204d450e48a78245

Error: INFO [compile_lg.py:107] Connecting LG after k2.determinize
INFO [compile_lg.py:110] Removing disambiguation symbols on LG /path/to/k2-exp/k2/k2/csrc/array.h:501:void k2::Array1::Init(k2::ContextPtr, int32_t, k2::Dtype) [with T = int; k2::ContextPtr = std::shared_ptr; int32_t = int] Check failed: size >= 0 (-96062515 vs. 0) Array size MUST be greater than or equal to 0, given :-96062515

[ Stack-Trace: ]
/path/to/k2-exp/k2/build_debug/lib/libk2_log.so(k2::internal::GetStackTrace()+0x46) [0x7f009f8e7696]
/path/to/k2-exp/k2/build_debug/lib/libk2context.so(k2::internal::Logger::~Logger()+0x35) [0x7f009ffdd5cf]
/path/to/k2-exp/k2/build_debug/lib/libk2context.so(k2::Array1::Init(std::shared_ptr, int, k2::Dtype)+0x1f6) [0x7f009ffe194c]
/path/to/k2-exp/k2/build_debug/lib/libk2context.so(k2::Array1::Array1(std::shared_ptr, int, k2::Dtype)+0x50) [0x7f009ffdf54c]
/path/to/k2-exp/k2/build_debug/lib/libk2context.so(k2::RaggedShapeFromTotSizes(std::shared_ptr, int, int const)+0x229) [0x7f00a0180148] /path/to/k2-exp/k2/build_debug/lib/libk2context.so(k2::IndexAxis0(k2::RaggedShape&, k2::Array1 const&, k2::Array1)+0x382) [0x7f00a0181a33] /path/to/k2-exp/k2/build_debug/lib/libk2context.so(k2::Index(k2::RaggedShape&, int, k2::Array1 const&, k2::Array1*)+0x1a6) [0x7f00a018250a]