jrmuizel / pdf-extract

A rust library for extracting content from pdfs
364 stars 73 forks source link

panic while parsing PDF #76

Closed dilawar closed 6 months ago

dilawar commented 6 months ago

PDF Resume (uploading with owner permission):
inam_ul_haq_cv.pdf

warning: `pdf-extract` (example "extract") generated 1 warning
    Finished dev [unoptimized + debuginfo] target(s) in 29.72s
     Running `target\debug\examples\extract.exe 'C:/Users/dilaw/Downloads/inam_ul_haq_cv.pdf'`
C:/Users/dilaw/Downloads/inam_ul_haq_cv.pdf
Unicode mismatch true fi "fi" Ok("fi") [64257]
Unicode mismatch true fl "fl" Ok("fl") [64258]
Unicode mismatch true ffi "ffi" Ok("ffi") [64259]
thread 'main' panicked at src\lib.rs:469:69:
no entry found for key
stack backtrace:
   0: std::panicking::begin_panic_handler
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library\std\src\panicking.rs:597
   1: core::panicking::panic_fmt
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library\core\src\panicking.rs:72
   2: core::panicking::panic_display
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library\core\src\panicking.rs:168
   3: core::panicking::panic_str
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library\core\src\panicking.rs:152
   4: core::option::expect_failed
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library\core\src\option.rs:1988
   5: std::collections::hash::map::impl$9::index<u32,u32,alloc::string::String,std::collections::hash::map::RandomState>
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962\library\std\src\collections\hash\map.rs:1341
   6: pdf_extract::PdfSimpleFont::new
             at .\src\lib.rs:469
   7: pdf_extract::make_font
             at .\src\lib.rs:329
   8: pdf_extract::impl$31::process_stream::closure$2
             at .\src\lib.rs:1594
   9: enum2$<std::collections::hash::map::Entry<alloc::vec::Vec<u8,alloc::alloc::Global>,alloc::rc::Rc<dyn$<pdf_extract::PdfFont>,alloc::alloc::Global> > >::or_insert_with<alloc::vec::Vec<u8,alloc::alloc::Global>,alloc::rc::Rc<dyn$<pdf_extract::PdfFont>,alloc::
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962\library\std\src\collections\hash\map.rs:2560
  10: pdf_extract::Processor::process_stream
             at .\src\lib.rs:1594
  11: pdf_extract::output_doc
             at .\src\lib.rs:2158
  12: extract::main
             at .\examples\extract.rs:36
  13: core::ops::function::FnOnce::call_once<void (*)(),tuple$<> >
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962\library\core\src\ops\function.rs:250
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: process didn't exit successfully: `target\debug\examples\extract.exe 'C:/Users/dilaw/Downloads/inam_ul_haq_cv.pdf'` (exit code: 101)
jrmuizel commented 6 months ago

It looks like this is caused by the pdf using the glyphname 'envelope' which comes from fontawesome

dilawar commented 6 months ago

Yes. I've been told that this resume was prepared using LaTeX.