tmc / langchaingo

LangChain for Go, the easiest way to write LLM-based programs in Go
https://tmc.github.io/langchaingo/
MIT License
3.76k stars 523 forks source link

RecursiveCharacter stack overflow #793

Open ioannist opened 2 months ago

ioannist commented 2 months ago

This was in the 0.1 version but I did not find an issue for it, so I am reporting it.

Calling

charSplitter := textsplitter.NewRecursiveCharacter(textsplitter.WithChunkSize(1800),)

on input that looks like less than 1800 characters, results to stack overflow.

Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime: goroutine stack exceeds 1000000000-byte limit
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime: sp=0x4021058390 stack=[0x4021058000, 0x4041058000]
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: fatal error: stack overflow
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime stack:
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime.throw({0x1263034?, 0x1ef3060?})
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]:         /usr/local/go/src/runtime/panic.go:1047 +0x40 fp=0xffff793195a0 sp=0xffff79319570 pc=0x45840
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime.newstack()
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]:         /usr/local/go/src/runtime/stack.go:1105 +0x460 fp=0xffff79319750 sp=0xffff793195a0 pc=0x5ee90
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime.morestack()
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]:         /usr/local/go/src/runtime/asm_arm64.s:316 +0x70 fp=0xffff79319750 sp=0xffff79319750 pc=0x74910
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: goroutine 34703 [running]:
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime.writeHeapBits.write({0x40026a5600?, 0x0?, 0x2c?, 0x2c?}, 0x1?, 0x1?)
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]:         /usr/local/go/src/runtime/mbitmap.go:791 +0x138 fp=0x4021058390 sp=0x4021058390 pc=0x25cf8
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime.heapBitsSetType(0x40026a5760, 0x10, 0x10, 0xfcbea0)
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]:         /usr/local/go/src/runtime/mbitmap.go:1026 +0xd4 fp=0x4021058450 sp=0x4021058390 pc=0x260a4
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime.mallocgc(0x10, 0xfcbea0, 0x1)
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]:         /usr/local/go/src/runtime/malloc.go:1074 +0x58c fp=0x40210584c0 sp=0x4021058450 pc=0x1e0fc
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime.makeslice(0x400022a360?, 0xa000a0000000000?, 0x149f790?)
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]:         /usr/local/go/src/runtime/slice.go:103 +0x50 fp=0x40210584f0 sp=0x40210584c0 pc=0x5cc00
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: strings.genSplit({0x400022a360, 0x5e0}, {0x149f790, 0x1}, 0x0, 0xfcbea0?)
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]:         /usr/local/go/src/strings/strings.go:247 +0x6c fp=0x4021058550 sp=0x40210584f0 pc=0x12daec
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: strings.Split(...)
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]:         /usr/local/go/src/strings/strings.go:305
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: github.com/tmc/langchaingo/textsplitter.RecursiveCharacter.SplitText({{0x400085f710, 0x3, 0x3}, 0x200, 0x64}, {0x400022a360, 0x5e0})
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]:         /home/ubuntu/go/pkg/mod/github.com/tmc/langchaingo@v0.1.0/textsplitter/recursive_character.go:51 +0x138 fp=0x4021058660 sp=0x4021058550 pc=0x380eb8
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: github.com/tmc/langchaingo/textsplitter.RecursiveCharacter.SplitText({{0x400085f710, 0x3, 0x3}, 0x200, 0x64}, {0x400022a360, 0x5e0})
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]:         /home/ubuntu/go/pkg/mod/github.com/tmc/langchaingo@v0.1.0/textsplitter/recursive_character.go:68 +0x364 fp=0x4021058770 sp=0x4021058660 pc=0x3810e4
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: github.com/tmc/langchaingo/textsplitter.RecursiveCharacter.SplitText({{0x400085f710, 0x3, 0x3}, 0x200, 0x64}, {0x400022a360, 0x5e0})
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]:         /home/ubuntu/go/pkg/mod/github.com/tmc/langchaingo@v0.1.0/textsplitter/recursive_character.go:68 +0x364 fp=0x4021058880 sp=0x4021058770 pc=0x3810e4
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: github.com/tmc/langchaingo/textsplitter.RecursiveCharacter.SplitText({{0x400085f710, 0x3, 0x3}, 0x200, 0x64}, {0x400022a360, 0x5e0})
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]:         /home/ubuntu/go/pkg/mod/github.com/tmc/langchaingo@v0.1.0/textsplitter/recursive_character.go:68 +0x364 fp=0x4021058990 sp=0x4021058880 pc=0x3810e4
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: github.com/tmc/langchaingo/textsplitter.RecursiveCharacter.SplitText({{0x400085f710, 0x3, 0x3}, 0x200, 0x64}, {0x400022a360, 0x5e0})
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]:         /home/ubuntu/go/pkg/mod/github.com/tmc/langchaingo@v0.1.0/textsplitter/recursive_character.go:68 +0x364 fp=0x4021058aa0 sp=0x4021058990 pc=0x3810e4
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: github.com/tmc/langchaingo/textsplitter.RecursiveCharacter.SplitText({{0x400085f710, 0x3, 0x3}, 0x200, 0x64}, {0x400022a360, 0x5e0})
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]:         /home/ubuntu/go/pkg/mod/github.com/tmc/langchaingo@v0.1.0/textsplitter/recursive_character.go:68 +0x364 fp=0x4021058bb0 sp=0x4021058aa0 pc=0x3810e4
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: github.com/tmc/langchaingo/textsplitter.RecursiveCharacter.SplitText({{0x400085f710, 0x3, 0x3}, 0x200, 0x64}, {0x400022a360, 0x5e0})