jrmuizel / pdf-extract

A rust library for extracting content from pdfs
364 stars 73 forks source link

panicked at 'attempt to add with overflow' #46

Open AndyJado opened 1 year ago

AndyJado commented 1 year ago

attempt to add with overflow

thread 'main' panicked at 'attempt to add with overflow', ~/.cargo/registry/src/github.com-1ecc6299db9ec823/adobe-cmap-parser-0.3.3/src/lib.rs:202:41

backtrace

stack backtrace:
   0: rust_begin_unwind
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/panicking.rs:64:14
   2: core::panicking::panic
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/panicking.rs:111:5
   3: adobe_cmap_parser::get_unicode_map
             at /Users/wangli/.cargo/registry/src/github.com-1ecc6299db9ec823/adobe-cmap-parser-0.3.3/src/lib.rs:202:41
   4: pdf_extract::get_unicode_map
             at /Users/wangli/Repos/pdf-extract/src/lib.rs:815:24
   5: pdf_extract::PdfCIDFont::new
             at /Users/wangli/Repos/pdf-extract/src/lib.rs:881:27
   6: pdf_extract::make_font
             at /Users/wangli/Repos/pdf-extract/src/lib.rs:319:17
   7: pdf_extract::Processor::process_stream::{{closure}}
             at /Users/wangli/Repos/pdf-extract/src/lib.rs:1483:84
   8: std::collections::hash::map::Entry<K,V>::or_insert_with
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/std/src/collections/hash/map.rs:2559:43
   9: pdf_extract::Processor::process_stream
             at /Users/wangli/Repos/pdf-extract/src/lib.rs:1483:32
  10: pdf_extract::output_doc
             at /Users/wangli/Repos/pdf-extract/src/lib.rs:2044:9
  11: extract::main
             at ./extract.rs:39:5
  12: core::ops::function::FnOnce::call_once
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/ops/function.rs:507:5
jrmuizel commented 1 year ago

Can you share a pdf that causes this?

haoyuan80s commented 2 months ago

I got similar issue with this pdf https://arxiv.org/pdf/2305.03653. Could you help take a look? Thank you!

fn main() {
    let client = reqwest::blocking::Client::new();
    let bytes = client
        .get("https://arxiv.org/pdf/2305.03653")
        .send()
        .unwrap()
        .bytes()
        .unwrap();
    let out = pdf_extract::extract_text_from_mem(&bytes).unwrap();
    println!("{}", out);
}