jrmuizel / pdf-extract

A rust library for extracting content from pdfs
396 stars 78 forks source link

fixed crashing debug output when font has no name #29

Closed Grollicus closed 2 years ago

Grollicus commented 2 years ago

I had a pdf document that was crashing like this:


thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: DictKey', $home/.cargo/registry/src/github.com-1ecc6299db9ec823/pdf-extract-0.6.3/src/lib.rs:470:62
stack backtrace:
   0: rust_begin_unwind
             at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/std/src/panicking.rs:517:5
   1: core::panicking::panic_fmt
             at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/panicking.rs:101:14
   2: core::result::unwrap_failed
             at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/result.rs:1617:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/result.rs:1299:23
   4: pdf_extract::PdfSimpleFont::new
             at $home/.cargo/registry/src/github.com-1ecc6299db9ec823/pdf-extract-0.6.3/src/lib.rs:470:40
   5: pdf_extract::make_font
             at $home/.cargo/registry/src/github.com-1ecc6299db9ec823/pdf-extract-0.6.3/src/lib.rs:323:17
   6: pdf_extract::Processor::process_stream::{{closure}}
             at $home/.cargo/registry/src/github.com-1ecc6299db9ec823/pdf-extract-0.6.3/src/lib.rs:1478:84
   7: std::collections::hash::map::Entry<K,V>::or_insert_with
             at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/std/src/collections/hash/map.rs:2345:43
   8: pdf_extract::Processor::process_stream
             at $home/.cargo/registry/src/github.com-1ecc6299db9ec823/pdf-extract-0.6.3/src/lib.rs:1478:32
   9: pdf_extract::output_doc
             at $home/.cargo/registry/src/github.com-1ecc6299db9ec823/pdf-extract-0.6.3/src/lib.rs:2029:9

Crashing was this line:

let name = pdf_to_utf8(encoding.get(b"Type").unwrap().as_name().unwrap());

Or more specifically encoding.get(b"Type").unwrap(). So, basically, no Type. As name is only used for debug output I think it can be safely removed?