Open ioannist opened 2 months ago
This was in the 0.1 version but I did not find an issue for it, so I am reporting it.
Calling
charSplitter := textsplitter.NewRecursiveCharacter(textsplitter.WithChunkSize(1800),)
on input that looks like less than 1800 characters, results to stack overflow.
Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime: goroutine stack exceeds 1000000000-byte limit Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime: sp=0x4021058390 stack=[0x4021058000, 0x4041058000] Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: fatal error: stack overflow Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime stack: Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime.throw({0x1263034?, 0x1ef3060?}) Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: /usr/local/go/src/runtime/panic.go:1047 +0x40 fp=0xffff793195a0 sp=0xffff79319570 pc=0x45840 Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime.newstack() Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: /usr/local/go/src/runtime/stack.go:1105 +0x460 fp=0xffff79319750 sp=0xffff793195a0 pc=0x5ee90 Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime.morestack() Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: /usr/local/go/src/runtime/asm_arm64.s:316 +0x70 fp=0xffff79319750 sp=0xffff79319750 pc=0x74910 Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: goroutine 34703 [running]: Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime.writeHeapBits.write({0x40026a5600?, 0x0?, 0x2c?, 0x2c?}, 0x1?, 0x1?) Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: /usr/local/go/src/runtime/mbitmap.go:791 +0x138 fp=0x4021058390 sp=0x4021058390 pc=0x25cf8 Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime.heapBitsSetType(0x40026a5760, 0x10, 0x10, 0xfcbea0) Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: /usr/local/go/src/runtime/mbitmap.go:1026 +0xd4 fp=0x4021058450 sp=0x4021058390 pc=0x260a4 Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime.mallocgc(0x10, 0xfcbea0, 0x1) Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: /usr/local/go/src/runtime/malloc.go:1074 +0x58c fp=0x40210584c0 sp=0x4021058450 pc=0x1e0fc Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: runtime.makeslice(0x400022a360?, 0xa000a0000000000?, 0x149f790?) Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: /usr/local/go/src/runtime/slice.go:103 +0x50 fp=0x40210584f0 sp=0x40210584c0 pc=0x5cc00 Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: strings.genSplit({0x400022a360, 0x5e0}, {0x149f790, 0x1}, 0x0, 0xfcbea0?) Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: /usr/local/go/src/strings/strings.go:247 +0x6c fp=0x4021058550 sp=0x40210584f0 pc=0x12daec Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: strings.Split(...) Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: /usr/local/go/src/strings/strings.go:305 Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: github.com/tmc/langchaingo/textsplitter.RecursiveCharacter.SplitText({{0x400085f710, 0x3, 0x3}, 0x200, 0x64}, {0x400022a360, 0x5e0}) Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: /home/ubuntu/go/pkg/mod/github.com/tmc/langchaingo@v0.1.0/textsplitter/recursive_character.go:51 +0x138 fp=0x4021058660 sp=0x4021058550 pc=0x380eb8 Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: github.com/tmc/langchaingo/textsplitter.RecursiveCharacter.SplitText({{0x400085f710, 0x3, 0x3}, 0x200, 0x64}, {0x400022a360, 0x5e0}) Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: /home/ubuntu/go/pkg/mod/github.com/tmc/langchaingo@v0.1.0/textsplitter/recursive_character.go:68 +0x364 fp=0x4021058770 sp=0x4021058660 pc=0x3810e4 Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: github.com/tmc/langchaingo/textsplitter.RecursiveCharacter.SplitText({{0x400085f710, 0x3, 0x3}, 0x200, 0x64}, {0x400022a360, 0x5e0}) Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: /home/ubuntu/go/pkg/mod/github.com/tmc/langchaingo@v0.1.0/textsplitter/recursive_character.go:68 +0x364 fp=0x4021058880 sp=0x4021058770 pc=0x3810e4 Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: github.com/tmc/langchaingo/textsplitter.RecursiveCharacter.SplitText({{0x400085f710, 0x3, 0x3}, 0x200, 0x64}, {0x400022a360, 0x5e0}) Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: /home/ubuntu/go/pkg/mod/github.com/tmc/langchaingo@v0.1.0/textsplitter/recursive_character.go:68 +0x364 fp=0x4021058990 sp=0x4021058880 pc=0x3810e4 Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: github.com/tmc/langchaingo/textsplitter.RecursiveCharacter.SplitText({{0x400085f710, 0x3, 0x3}, 0x200, 0x64}, {0x400022a360, 0x5e0}) Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: /home/ubuntu/go/pkg/mod/github.com/tmc/langchaingo@v0.1.0/textsplitter/recursive_character.go:68 +0x364 fp=0x4021058aa0 sp=0x4021058990 pc=0x3810e4 Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: github.com/tmc/langchaingo/textsplitter.RecursiveCharacter.SplitText({{0x400085f710, 0x3, 0x3}, 0x200, 0x64}, {0x400022a360, 0x5e0}) Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: /home/ubuntu/go/pkg/mod/github.com/tmc/langchaingo@v0.1.0/textsplitter/recursive_character.go:68 +0x364 fp=0x4021058bb0 sp=0x4021058aa0 pc=0x3810e4 Apr 19 05:14:33 ip-172-31-14-170 ppai-temporal-worker[3730481]: github.com/tmc/langchaingo/textsplitter.RecursiveCharacter.SplitText({{0x400085f710, 0x3, 0x3}, 0x200, 0x64}, {0x400022a360, 0x5e0})
This was in the 0.1 version but I did not find an issue for it, so I am reporting it.
Calling
charSplitter := textsplitter.NewRecursiveCharacter(textsplitter.WithChunkSize(1800),)
on input that looks like less than 1800 characters, results to stack overflow.