jrmuizel / pdf-extract

A rust library for extracting content from pdfs
368 stars 75 forks source link

fix missing unicode map entries #22

Open llogiq opened 4 years ago

llogiq commented 4 years ago

We get a panic from the current version on files which appear to miss some character values for German umlauts. (e.g. 252 = ü). This should fix the issue, but frankly I'm unsure if it's the right thing to do.

This might relate to #9, btw.

jrmuizel commented 3 years ago

It looks like there's a bunch of other stuff in this change. Is that intentional?

menteb commented 1 year ago

I'm getting Unicode mismatch errors when reading in a PDF. I've tried to use encode_rs to get around these, but to no avail.

jrmuizel commented 1 year ago

Can you share a link to the pdf?