jrmuizel / pdf-extract

A rust library for extracting content from pdfs
396 stars 78 forks source link

Fix panic by setting default_width to Some(1.0) #83

Closed prscoelho closed 7 months ago

prscoelho commented 7 months ago

default_width is never changed from its initial value of None, so for pdfs that end up unwrapping this value, it always results in a panic. Therefore, assume a default_width of Some(1.0)

This change yields correct results for my pdf that was previously panicking.

jrmuizel commented 7 months ago

Do you have some reference for the choice of 1.0 as the default? Can you link to an example PDF that shows this problem?

prscoelho commented 7 months ago

I have no reference for the value of 1.0 as default, it was just that the alternative was unwrapping a None value. It's entirely possible that the problem should be solved somewhere else in the code.

This pdf that I have is private, and it also relies on https://github.com/jrmuizel/pdf-extract/pull/82 (decrypting with empty password)

But I will see if I can edit this to remove the encryption and remove some private information. Is there some place I can send you this pdf privately?

jrmuizel commented 7 months ago

You can send it to jrmuizel@gmail.com

prscoelho commented 7 months ago

I sent you the pdf. Upon closer look I think this character is just broken :( Some(1.0) avoids a panic, but it's likely not the correct value..

jrmuizel commented 7 months ago

I pushed 73313393a194faebc4a9bf025dd4e9063db9ea04 which should fix this.

prscoelho commented 7 months ago

It does indeed. Thank you!