jrmuizel / pdf-extract

A rust library for extracting content from pdfs
396 stars 78 forks source link

Panic on specific cases of "Separation"-type ColorSpace #38

Closed badicsalex closed 1 year ago

badicsalex commented 2 years ago

I have a document that fails to parse: http://www.kozlonyok.hu/nkonline/MKPDF/hiteles/MK10200.pdf

It fails at this line: https://github.com/jrmuizel/pdf-extract/blob/4dbdc35f4ca3c6ac88e9cfcfbc59d897854adce7/src/lib.rs#L1294

According to the specs, the second argument is either a name, or an actual color space object:

A Separation colour space is defined as follows: [/Separation name alternateSpace tintTransform] ... The alternateSpace parameter shall be an array or name object that identifies the alternate colour space, which may be any device or CIE-based colour space but may not be another special colour space (Pattern, Indexed, Separation, or DeviceN).

In my fork I just replaced the expect with unwrap_or, since I do not display anything. To do this properly, I think the make_colorspace function would have to be refactored so that it can do the color space handling "twice".