Closed becoded closed 2 years ago
We are using ExtractText() and from time to time, we are getting an index out of range error.
ExtractText()
Stacktrace:
panic: runtime error: index out of range [0] with length 0 [recovered] panic: runtime error: index out of range [0] with length 0 goroutine 21 [running]: testing.tRunner.func1.2({0x1009ba340, 0x140001f5d28}) /opt/homebrew/Cellar/go/1.18.1/libexec/src/testing/testing.go:1389 +0x1c8 testing.tRunner.func1() /opt/homebrew/Cellar/go/1.18.1/libexec/src/testing/testing.go:1392 +0x384 panic({0x1009ba340, 0x140001f5d28}) /opt/homebrew/Cellar/go/1.18.1/libexec/src/runtime/panic.go:838 +0x204 github.com/unidoc/unipdf/v3/internal/textencoding.CMapEncoder.CharcodeToRune(...) /Projects/go/workspace/pkg/mod/github.com/unidoc/unipdf/v3@v3.34.0/internal/textencoding/textencoding.go:552 github.com/unidoc/unipdf/v3/extractor.(*textObject).renderText(0x14000ab02c0, {0x14000759328, 0x1, 0x8}) /Projects/go/workspace/pkg/mod/github.com/unidoc/unipdf/v3@v3.34.0/extractor/extractor.go:762 +0xab0 github.com/unidoc/unipdf/v3/extractor.(*textObject).showTextAdjusted(0x14000ab02c0, 0x1400000fea8) /Projects/go/workspace/pkg/mod/github.com/unidoc/unipdf/v3@v3.34.0/extractor/extractor.go:132 +0x178 github.com/unidoc/unipdf/v3/extractor.(*Extractor).extractPageText.func1(0x1400034fdd0, {{0x1009f2d78, 0x100f63dc8}, {0x1009f2e80, 0x14000084360}, {0x1009801a0, 0x140006021c8}, {0x10099ad00, 0x140001f5cf8}, {0x3ff0000000000000, ...}}, ...) /Projects/go/workspace/pkg/mod/github.com/unidoc/unipdf/v3@v3.34.0/extractor/extractor.go:797 +0x2348 github.com/unidoc/unipdf/v3/contentstream.(*ContentStreamProcessor).Process(0x14000765aa0, 0x100f63dc8?) /Projects/go/workspace/pkg/mod/github.com/unidoc/unipdf/v3@v3.34.0/contentstream/contentstream.go:314 +0xa94 github.com/unidoc/unipdf/v3/extractor.(*Extractor).extractPageText(0x14000136060, {0x14000644000, 0x9a44e}, 0x14000418060?, {0x3ff0000000000000, 0x0, 0x0, 0x0, 0x3ff0000000000000, 0x0, ...}, ...) /Projects/go/workspace/pkg/mod/github.com/unidoc/unipdf/v3@v3.34.0/extractor/extractor.go:828 +0x754 github.com/unidoc/unipdf/v3/extractor.(*Extractor).ExtractPageText(0x14000136060) /Projects/go/workspace/pkg/mod/github.com/unidoc/unipdf/v3@v3.34.0/extractor/extractor.go:243 +0x74 github.com/unidoc/unipdf/v3/extractor.(*Extractor).ExtractTextWithStats(0x14000214380?) /Projects/go/workspace/pkg/mod/github.com/unidoc/unipdf/v3@v3.34.0/extractor/extractor.go:508 +0x20 github.com/unidoc/unipdf/v3/extractor.(*Extractor).ExtractText(...) /Projects/go/workspace/pkg/mod/github.com/unidoc/unipdf/v3@v3.34.0/extractor/extractor.go:526
Currently, the obfuscated code of CMapEncoder.CharcodeToRune, looks like:
CMapEncoder.CharcodeToRune
func (_agg CMapEncoder) CharcodeToRune(code CharCode) (rune, bool) { _egf, _ceg := _agg.charcodeToString(code) return ([]rune(_egf))[0], _ceg }
The error happens because charcodeToString returns in some cases for these files an empty string. And []rune("") = nil
charcodeToString
[]rune("")
nil
So a potential fix would be:
func (_agg CMapEncoder) CharcodeToRune(code CharCode) (rune, bool) { _egf, _ceg := _agg.charcodeToString(code) if _egf == "" { return MissingCodeRune, false } return ([]rune(_egf))[0], _ceg }
No panics when extracting text
Triggers a panic: runtime error: index out of range [0] with length 0 in certain cases
Sadly enough, I can't share a file due to GDPR reasons.
Hi @becoded,
Thank you for reporting this issue and the potential fix. We released new version v3.35.0 https://github.com/unidoc/unipdf-src/releases/tag/v3.35.0
Description
We are using
ExtractText()
and from time to time, we are getting an index out of range error.Stacktrace:
Currently, the obfuscated code of
CMapEncoder.CharcodeToRune
, looks like:The error happens because
charcodeToString
returns in some cases for these files an empty string. And[]rune("")
=nil
So a potential fix would be:
Expected Behavior
No panics when extracting text
Actual Behavior
Triggers a panic: runtime error: index out of range [0] with length 0 in certain cases
Attachments
Sadly enough, I can't share a file due to GDPR reasons.