unidoc / unipdf

Golang PDF library for creating and processing PDF files (pure go)
https://unidoc.io
Other
2.6k stars 255 forks source link

[BUG] Panic on extracting images #566

Open crashingfish opened 1 month ago

crashingfish commented 1 month ago

Description

Panic is triggered when ExtractPageImages is called. This does not happen all the time, but happens rarely.

Expected Behavior

ExtractPageImages should not panic. Should either return error or page images.

Actual Behavior

Call ExtractPageImages with empty ImageExtractOption

Attachments

Cannot share pdf images due to compliance issues. Code to reproduce issue:

page, _ := pdfReader.GetPage(1) pextract, _ := extractor.New(page) pimages, err := pextract.ExtractPageImages(&extractor.ImageExtractOptions{})

Stack trace

` panic: runtime error: index out of range [1256] with length 1032

goroutine 8 [running]: github.com/unidoc/unipdf/v3/internal/jbig2/document/segments.(TextRegion).decodeIb(0x14002156000, 0x0?, 0x4e8) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/internal/jbig2/document/segments/segments.go:320 +0x734 github.com/unidoc/unipdf/v3/internal/jbig2/document/segments.(TextRegion).decodeSymbolInstances(0x14002156000) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/internal/jbig2/document/segments/segments.go:77 +0x1dc github.com/unidoc/unipdf/v3/internal/jbig2/document/segments.(TextRegion).GetRegionBitmap(0x14002156000) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/internal/jbig2/document/segments/segments.go:283 +0x13c github.com/unidoc/unipdf/v3/internal/jbig2/document.(Page).createNormalPage(0x140003b29a0, 0x140003623c0) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/internal/jbig2/document/document.go:79 +0x220 github.com/unidoc/unipdf/v3/internal/jbig2/document.(Page).createPage(0x14000430280?, 0x1?) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/internal/jbig2/document/document.go:70 +0x38 github.com/unidoc/unipdf/v3/internal/jbig2/document.(Page).composePageBitmap(0x140003b29a0) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/internal/jbig2/document/document.go:115 +0x4c github.com/unidoc/unipdf/v3/internal/jbig2/document.(Page).GetBitmap(0x140003b29a0) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/internal/jbig2/document/document.go:112 +0xe8 github.com/unidoc/unipdf/v3/internal/jbig2/decoder.(Decoder).decodePage(0x140007a10e0, 0x30e9?) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/internal/jbig2/decoder/decoder.go:21 +0x1d0 github.com/unidoc/unipdf/v3/internal/jbig2/decoder.(Decoder).DecodeNextPage(...) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/internal/jbig2/decoder/decoder.go:19 github.com/unidoc/unipdf/v3/internal/jbig2.DecodeBytes({0x140003d5500, 0x30e9, 0x30e9}, {0x0?, 0x14000018c18?}, {0x14000018c10?, 0x10?, 0x103462140?}) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/internal/jbig2/jbig2.go:16 +0x98 github.com/unidoc/unipdf/v3/core.(JBIG2Encoder).DecodeBytes(...) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/core/core.go:1345 github.com/unidoc/unipdf/v3/core.(JBIG2Encoder).DecodeStream(0x140000be370?, 0xf?) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/core/core.go:1732 +0x50 github.com/unidoc/unipdf/v3/core.DecodeStream(0x140000b0000?) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/core/core.go:91 +0x198 github.com/unidoc/unipdf/v3/model.(XObjectImage).ToImage(0x14000244580) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/model/model.go:4153 +0x130 github.com/unidoc/unipdf/v3/extractor._aegba({0x10357f018?, 0x140000be370?}, {0x10357ba68, 0x14002154c74}) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/extractor/extractor.go:508 +0x3c github.com/unidoc/unipdf/v3/extractor.(imageExtractContext).extractXObjectImage(0x1400007ee60, 0x1400040a7e0, {{0x103586360, 0x103dcd060}, {0x103586360, 0x103dcd060}, {0x1034ae220, 0x1400086a1d8}, {0x1034ae220, 0x1400086a1e0}, ...}, ...) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/extractor/extractor.go:788 +0x188 github.com/unidoc/unipdf/v3/extractor.(imageExtractContext).processOperand(0x1400007ee60, 0x14000019268?, {{0x103586360, 0x103dcd060}, {0x103586360, 0x103dcd060}, {0x1034ae220, 0x1400086a1d8}, {0x1034ae220, 0x1400086a1e0}, ...}, ...) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/extractor/extractor.go:125 +0x20c github.com/unidoc/unipdf/v3/contentstream.(ContentStreamProcessor).Process(0x14000019670, 0x3c?) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/contentstream/contentstream.go:623 +0xa84 github.com/unidoc/unipdf/v3/extractor.(imageExtractContext).extractContentStreamImages(0x1400007ee60, {0x14000414100?, 0x1?}, 0x1034aed20?) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/extractor/extractor.go:912 +0x208 github.com/unidoc/unipdf/v3/extractor.(*Extractor).ExtractPageImages(0x14000430000, 0x1400086a038) /path/go/pkg/mod/github.com/unidoc/unipdf/v3@v3.61.0/extractor/extractor.go:223 +0x68 `

3ace commented 1 month ago

@crashingfish thanks for reporting this. Is it possible for you to attach a sample PDF document where this issue is happening?

crashingfish commented 1 month ago

@3ace Sorry. I would have already if it was allowed. However happy to provide any other information about the pdf that may help you reconstruct a similar file at your end.

3ace commented 1 month ago

@crashingfish so far what we could gather is the PDF file contains an image file encoded using JBIG2 encoding based on the log you provided, but we couldn't replicate the issue as of now.

Is it possible for you to extract the image using the same encoding and send it to us so that we might be able to reconstruct similar file?

crashingfish commented 1 month ago

@3ace I will need to check what that image contains and if it is allowed to be shared. Will get back.

crashingfish commented 1 month ago

@3ace I am still checking on this. Meanwhile, would you be able to share your official email id?

3ace commented 1 month ago

@crashingfish you can reach us trough support@unidoc.io

3ace commented 1 month ago

@crashingfish Hi, we haven’t received an email from you. Could you please confirm if you’ve sent it to us?