unidoc / unipdf

Golang PDF library for creating and processing PDF files (pure go)
https://unidoc.io
Other
2.48k stars 250 forks source link

[BUG] Crash parseBfrange index out of range #430

Closed Kleissner closed 2 years ago

Kleissner commented 3 years ago

Description

Current version (prep-rc-v3.16.0-take2) crashes when extracting text.

panic: runtime error: index out of range [-1]

goroutine 61 [running]:
github.com/unidoc/unipdf/internal/cmap.(*CMap).parseBfrange(0xc016190f70, 0x1ec9380, 0xc009b57f70)
    C:/Temp/Go/src/github.com/unidoc/unipdf/internal/cmap/cmap.go:12 +0x99e
github.com/unidoc/unipdf/internal/cmap.(*CMap).parse(0xc016190f70, 0x67c, 0xe00)
    C:/Temp/Go/src/github.com/unidoc/unipdf/internal/cmap/cmap.go:12 +0x5b4
github.com/unidoc/unipdf/internal/cmap.LoadCmapFromData(0xc009b68000, 0x67c, 0xe00, 0x1, 0x0, 0x0, 0x2265320)
    C:/Temp/Go/src/github.com/unidoc/unipdf/internal/cmap/cmap.go:12 +0x1a8
github.com/unidoc/unipdf/model._cfba(0x2265320, 0xc008e94900, 0xc003a03080, 0xc008e94900, 0xc008e98778, 0x3633373838353301)
    C:/Temp/Go/src/github.com/unidoc/unipdf/model/model.go:2128 +0x149
github.com/unidoc/unipdf/model._ecggg(0x22650e0, 0xc008e9ecc0, 0x0, 0x0, 0x0, 0x0)
    C:/Temp/Go/src/github.com/unidoc/unipdf/model/model.go:2238 +0x9ff
github.com/unidoc/unipdf/model._aabe(0x22650e0, 0xc008e9ecc0, 0x1, 0x22650e0, 0xc008e9ecc0, 0x0)
    C:/Temp/Go/src/github.com/unidoc/unipdf/model/model.go:1626 +0x5f
github.com/unidoc/unipdf/model.NewPdfFontFromPdfObject(...)
    C:/Temp/Go/src/github.com/unidoc/unipdf/model/model.go:1175
github.com/unidoc/unipdf/extractor.(*textObject).getFontDirect(0xc00208d760, 0xc0151f3060, 0x4, 0x4, 0x2cd3600, 0xc000101000)
    C:/Temp/Go/src/github.com/unidoc/unipdf/extractor/extractor.go:16 +0x7c
github.com/unidoc/unipdf/extractor.(*textObject).getFont(0xc00208d760, 0xc0151f3060, 0x4, 0x0, 0x0, 0x3ff0000000000000)
    C:/Temp/Go/src/github.com/unidoc/unipdf/extractor/extractor.go:106 +0x7c
github.com/unidoc/unipdf/extractor.(*textObject).setFont(0xc00208d760, 0xc0151f3060, 0x4, 0x3ff0000000000000, 0x0, 0x0)
    C:/Temp/Go/src/github.com/unidoc/unipdf/extractor/extractor.go:210 +0x68
github.com/unidoc/unipdf/extractor.(*Extractor).extractPageText.func1(0xc0099d8540, 0x227bf20, 0x2cd0d78, 0x227bf20, 0x2cd0d78, 0x1ec0e00, 0xc00bd22bb8, 0x1ec0e00, 0xc00bd22bc0, 0x3ff0000000000000, ...)
    C:/Temp/Go/src/github.com/unidoc/unipdf/extractor/extractor.go:141 +0x29ec
github.com/unidoc/unipdf/contentstream.(*ContentStreamProcessor).Process(0xc006dc15b8, 0xc0098d1170, 0x0, 0x0)
    C:/Temp/Go/src/github.com/unidoc/unipdf/contentstream/contentstream.go:321 +0x54b
github.com/unidoc/unipdf/extractor.(*Extractor).extractPageText(0xc003a02fc0, 0xc009a02000, 0x1c43, 0xc0098d1170, 0x3ff0000000000000, 0x0, 0x0, 0x0, 0x3ff0000000000000, 0x0, ...)
    C:/Temp/Go/src/github.com/unidoc/unipdf/extractor/extractor.go:141 +0xa73
github.com/unidoc/unipdf/extractor.(*Extractor).ExtractPageText(0xc003a02fc0, 0x11cefbf, 0x30, 0x1f77f00, 0xc006dc1701, 0xc0099d8060)
    C:/Temp/Go/src/github.com/unidoc/unipdf/extractor/extractor.go:90 +0xf0
github.com/unidoc/unipdf/extractor.(*Extractor).ExtractTextWithStats(0xc003a02fc0, 0xc003a02fc0, 0x0, 0x0, 0x0, 0xc0086261a0, 0x0)
    C:/Temp/Go/src/github.com/unidoc/unipdf/extractor/extractor.go:220 +0x47
github.com/unidoc/unipdf/extractor.(*Extractor).ExtractText(...)
    C:/Temp/Go/src/github.com/unidoc/unipdf/extractor/extractor.go:131

Expected Behavior

No crash. PDF may or may not be corrupted, but it shouldn't crash either way.

Attachments

This is the file causing the crash when extracting text: e45f8ebb-bb7d-415e-8ae5-ab8c0ea56552.pdf

gunnsth commented 2 years ago

Seems like this has been fixed already. Cannot reproduce in latest version.