manisandro / gImageReader

A Gtk/Qt front-end to tesseract-ocr.
GNU General Public License v3.0
1.63k stars 190 forks source link

Crash when saving PDF with Japanese text #682

Open williamahartman opened 2 months ago

williamahartman commented 2 months ago

Thanks for working this application! the OCR process has been flawless and the output is much easier to validate than using the tesseract CLI. Unfortunately I'm seeing crashes when trying to save a PDF with Japanese text. The OCR parts all work great and I can save as .html or .odt with no problems. I can also export PDFs of English text using the default export settings without any problems.

If I try to export a PDF with Japanese text with the default export settings, the program silently exits, but this error gets printed if I run from the console.

** (gimagereader-gtk:328452): ERROR **: 12:29:43.119: 
unhandled exception (type std::exception) in signal handler:
what: PdfErrorCode::InvalidFontData, The font data is invalid.
Callstack:t#0 Error Source: main/PdfEncoding.cpp(82), Information: The provided string can't be converted to CID encoding

If I instead set the font in the pdf export settings to a CJK one ("Noto Sans Mono CJK JP" in my case), I get an error window with some debugging information, which I've included below.

Error Message ``` gImageReader 3.4.2 (rev ) #1 0x00007fad34b356be in gtk_css_selector_tree_match_foreach.lto_priv () #2 0x00007fad34b2b906 in gtk_css_style_provider_lookup () #3 0x00007fad34ca03b0 in gtk_style_cascade_lookup () at /lib64/libgtk-3.so.0 #4 0x00007fad34b4995d in gtk_css_static_style_new_compute () #5 0x00007fad34b287e6 in gtk_css_node_real_update_style () #6 0x00007fad34b2538b in gtk_css_node_ensure_style.part () #7 0x00007fad34b25638 in gtk_css_node_validate_internal.part () #8 0x00007fad34b2568b in gtk_css_node_validate_internal.part () #9 0x00007fad34b2568b in gtk_css_node_validate_internal.part () #10 0x00007fad34b2568b in gtk_css_node_validate_internal.part () #11 0x00007fad34b2568b in gtk_css_node_validate_internal.part () #12 0x00007fad34b2568b in gtk_css_node_validate_internal.part () #13 0x00007fad34b2568b in gtk_css_node_validate_internal.part () #14 0x00007fad34b2568b in gtk_css_node_validate_internal.part () #15 0x00007fad34b2568b in gtk_css_node_validate_internal.part () #16 0x00007fad34b0813b in gtk_container_idle_sizer () at /lib64/libgtk-3.so.0 #17 0x00007fad3521e254 in signal_emit_valist_unlocked () #18 0x00007fad3521e361 in g_signal_emit_valist () at /lib64/libgobject-2.0.so.0 #19 0x00007fad3521e423 in g_signal_emit () at /lib64/libgobject-2.0.so.0 #20 0x00007fad35b42154 in gdk_frame_clock_paint_idle () at /lib64/libgdk-3.so.0 #21 0x00007fad35b2e00d in gdk_threads_dispatch () at /lib64/libgdk-3.so.0 #22 0x00007fad349150d9 in g_timeout_dispatch () at /lib64/libglib-2.0.so.0 #23 0x00007fad3490ee8c in g_main_context_dispatch_unlocked.lto_priv () #24 0x00007fad34970c98 in g_main_context_iterate_unlocked.isra () #25 0x00007fad34910383 in g_main_context_iteration () #26 0x00007fad34befc5d in gtk_main_iteration_do () at /lib64/libgtk-3.so.0 #27 0x00007fad358a9111 in Gtk::Main::iteration(bool) () #28 0x00005651e384ec04 in Utils::busyTask(std::function const&, Glib::ustring const&) () #29 0x00005651e386e1f8 in HOCRPdfExporter::run(HOCRDocument const*, std::__cxx11::basic_string, std::allocator > const&, HOCRExporter::ExporterSettings const*) () #30 0x00005651e38b1b30 in OutputEditorHOCR::exportToPDF() [clone .isra.0] () #31 0x00007fad35ada8e4 in Glib::SignalProxyNormal::slot0_void_callback(_GObject*, void*) () at /lib64/libglibmm-2.4.so.1 #32 0x00007fad351fd64a in g_closure_invoke () at /lib64/libgobject-2.0.so.0 #33 0x00007fad3522d94d in signal_emit_unlocked_R.isra.0 () #34 0x00007fad3521e104 in signal_emit_valist_unlocked () #35 0x00007fad3521e361 in g_signal_emit_valist () at /lib64/libgobject-2.0.so.0 #36 0x00007fad3521e423 in g_signal_emit () at /lib64/libgobject-2.0.so.0 #37 0x00007fad34d4ae6c in gtk_widget_activate () at /lib64/libgtk-3.so.0 #38 0x00007fad34c0bb56 in gtk_menu_shell_activate_item () #39 0x00007fad34c0bf94 in gtk_menu_shell_button_release () #40 0x00007fad3592bf09 in Gtk::Widget::on_button_release_event(_GdkEventButton*) () at /lib64/libgtkmm-3.0.so.1 #41 0x00007fad35924d81 in Gtk::Widget_Class::button_release_event_callback(_GtkWidget*, _GdkEventButton*) () at /lib64/libgtkmm-3.0.so.1 #42 0x00007fad34a888d9 in _gtk_marshal_BOOLEAN__BOXEDv () #43 0x00007fad3521e254 in signal_emit_valist_unlocked () #44 0x00007fad3521e361 in g_signal_emit_valist () at /lib64/libgobject-2.0.so.0 #45 0x00007fad3521e423 in g_signal_emit () at /lib64/libgobject-2.0.so.0 #46 0x00007fad34d5fe3c in gtk_widget_event_internal.part.0.lto_priv () #47 0x00007fad34bf4328 in propagate_event.lto_priv () at /lib64/libgtk-3.so.0 #48 0x00007fad34bf50aa in gtk_main_do_event () at /lib64/libgtk-3.so.0 #49 0x00007fad35b34807 in _gdk_event_emit () at /lib64/libgdk-3.so.0 #50 0x00007fad35b6e3ce in gdk_event_source_dispatch () at /lib64/libgdk-3.so.0 #51 0x00007fad3490ee8c in g_main_context_dispatch_unlocked.lto_priv () #52 0x00007fad34970c98 in g_main_context_iterate_unlocked.isra () #53 0x00007fad34910383 in g_main_context_iteration () #54 0x00007fad3532d0fd in g_application_run () at /lib64/libgio-2.0.so.0 #55 0x00005651e37d76fa in main () Thread 12 (Thread 0x7fad0d2006c0 (LWP 332579) "gimagereader-gt"): #0 0x00007fad33d12e13 in wait4 () at /lib64/libc.so.6 #1 0x00005651e381aadc in MainWindow::signalHandlerExec(int, bool) () #2 0x00007fad33c4fd00 in () at /lib64/libc.so.6 #3 0x00005651e386c1c8 in HOCRPdfExporter::run(HOCRDocument const*, std::__cxx11::basic_string, std::allocator > const&, HOCRExporter::ExporterSettings const*)::{lambda()#1}::operator()() const [clone .lto_priv.0] () #4 0x00005651e384a073 in sigc::internal::slot_call0 const&, Glib::ustring const&)::{lambda()#1}, void>::call_it(sigc::internal::slot_rep*) () #5 0x00007fad35ad0e42 in call_thread_entry_slot.lto_priv () at /lib64/libglibmm-2.4.so.1 #6 0x00007fad3493f813 in g_thread_proxy () at /lib64/libglib-2.0.so.0 #7 0x00007fad33ca66d7 in start_thread () at /lib64/libc.so.6 #8 0x00007fad33d2a60c in clone3 () at /lib64/libc.so.6 Thread 11 (Thread 0x7facf54006c0 (LWP 332573) "pool-gimageread"): #0 0x00007fad33d283dd in syscall () at /lib64/libc.so.6 #1 0x00007fad3496deb0 in g_cond_wait_until () at /lib64/libglib-2.0.so.0 #2 0x00007fad348d95e3 in g_async_queue_pop_intern_unlocked () at /lib64/libglib-2.0.so.0 #3 0x00007fad3494159a in g_thread_pool_thread_proxy.lto_priv () at /lib64/libglib-2.0.so.0 #4 0x00007fad3493f813 in g_thread_proxy () at /lib64/libglib-2.0.so.0 #5 0x00007fad33ca66d7 in start_thread () at /lib64/libc.so.6 #6 0x00007fad33d2a60c in clone3 () at /lib64/libc.so.6 Thread 10 (Thread 0x7fad0f0006c0 (LWP 332505) "gimagereader-gt"): #0 0x00007fad33d1c87d in poll () at /lib64/libc.so.6 #1 0x00007fad0fe1fa9a in eloop_poll_func.lto_priv () at /usr/lib64/sane/libsane-airscan.so.1 #2 0x00007fad18052bc3 in avahi_simple_poll_run () at /lib64/libavahi-common.so.3 #3 0x00007fad18052db8 in avahi_simple_poll_iterate () at /lib64/libavahi-common.so.3 #4 0x00007fad0fe1fb8f in eloop_thread_func.lto_priv () at /usr/lib64/sane/libsane-airscan.so.1 #5 0x00007fad33ca66d7 in start_thread () at /lib64/libc.so.6 #6 0x00007fad33d2a60c in clone3 () at /lib64/libc.so.6 Thread 9 (Thread 0x7fad0fe006c0 (LWP 332501) "gimagereader-gt"): #0 0x00007fad33d283dd in syscall () at /lib64/libc.so.6 #1 0x00007fad3496dccd in g_cond_wait () at /lib64/libglib-2.0.so.0 #2 0x00007fad348d961b in g_async_queue_pop_intern_unlocked () at /lib64/libglib-2.0.so.0 #3 0x00007fad348d967c in g_async_queue_pop () at /lib64/libglib-2.0.so.0 #4 0x00007fad333980d9 in fc_thread_func () at /lib64/libpangoft2-1.0.so.0 #5 0x00007fad3493f813 in g_thread_proxy () at /lib64/libglib-2.0.so.0 #6 0x00007fad33ca66d7 in start_thread () at /lib64/libc.so.6 #7 0x00007fad33d2a60c in clone3 () at /lib64/libc.so.6 Thread 8 (Thread 0x7fad1a6006c0 (LWP 332500) "libusb_event"): #0 0x00007fad33d1c87d in poll () at /lib64/libc.so.6 #1 0x00007fad32892f93 in linux_udev_event_thread_main () at /lib64/libusb-1.0.so.0 #2 0x00007fad33ca66d7 in start_thread () at /lib64/libc.so.6 #3 0x00007fad33d2a60c in clone3 () at /lib64/libc.so.6 Thread 7 (Thread 0x7fad1b0006c0 (LWP 332499) "gimagereader-gt"): #0 0x00007fad33ca2da9 in __futex_abstimed_wait_common () at /lib64/libc.so.6 #1 0x00007fad33ca57f9 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libc.so.6 #2 0x00007fad33edd700 in std::condition_variable::wait(std::unique_lock&) () at /lib64/libstdc++.so.6 #3 0x00005651e389e6ff in ScannerSane::run() () #4 0x00007fad33ee7564 in execute_native_thread_routine () at /lib64/libstdc++.so.6 #5 0x00007fad33ca66d7 in start_thread () at /lib64/libc.so.6 #6 0x00007fad33d2a60c in clone3 () at /lib64/libc.so.6 Thread 6 (Thread 0x7fad1be006c0 (LWP 332490) "gimagereader-gt"): #0 0x00007fad33d283dd in syscall () at /lib64/libc.so.6 #1 0x00007fad3496dccd in g_cond_wait () at /lib64/libglib-2.0.so.0 #2 0x00007fad348d961b in g_async_queue_pop_intern_unlocked () at /lib64/libglib-2.0.so.0 #3 0x00007fad348d967c in g_async_queue_pop () at /lib64/libglib-2.0.so.0 #4 0x00007fad333980d9 in fc_thread_func () at /lib64/libpangoft2-1.0.so.0 #5 0x00007fad3493f813 in g_thread_proxy () at /lib64/libglib-2.0.so.0 #6 0x00007fad33ca66d7 in start_thread () at /lib64/libc.so.6 #7 0x00007fad33d2a60c in clone3 () at /lib64/libc.so.6 Thread 5 (Thread 0x7fad20c006c0 (LWP 332483) "dconf worker"): #0 0x00007fad33d1c87d in poll () at /lib64/libc.so.6 #1 0x00007fad34970c34 in g_main_context_iterate_unlocked.isra () at /lib64/libglib-2.0.so.0 #2 0x00007fad34910383 in g_main_context_iteration () at /lib64/libglib-2.0.so.0 #3 0x00007fad36064705 in dconf_gdbus_worker_thread () at /usr/lib64/gio/modules/libdconfsettings.so #4 0x00007fad3493f813 in g_thread_proxy () at /lib64/libglib-2.0.so.0 #5 0x00007fad33ca66d7 in start_thread () at /lib64/libc.so.6 #6 0x00007fad33d2a60c in clone3 () at /lib64/libc.so.6 Thread 4 (Thread 0x7fad216006c0 (LWP 332482) "gdbus"): #0 0x00007fad33d1c87d in poll () at /lib64/libc.so.6 #1 0x00007fad34970c34 in g_main_context_iterate_unlocked.isra () at /lib64/libglib-2.0.so.0 #2 0x00007fad34914f37 in g_main_loop_run () at /lib64/libglib-2.0.so.0 #3 0x00007fad35360682 in gdbus_shared_thread_func.lto_priv () at /lib64/libgio-2.0.so.0 #4 0x00007fad3493f813 in g_thread_proxy () at /lib64/libglib-2.0.so.0 #5 0x00007fad33ca66d7 in start_thread () at /lib64/libc.so.6 #6 0x00007fad33d2a60c in clone3 () at /lib64/libc.so.6 Thread 3 (Thread 0x7fad22a006c0 (LWP 332480) "gmain"): #0 0x00007fad33d1c87d in poll () at /lib64/libc.so.6 #1 0x00007fad34970c34 in g_main_context_iterate_unlocked.isra () at /lib64/libglib-2.0.so.0 #2 0x00007fad34910383 in g_main_context_iteration () at /lib64/libglib-2.0.so.0 #3 0x00007fad349103e1 in glib_worker_main () at /lib64/libglib-2.0.so.0 #4 0x00007fad3493f813 in g_thread_proxy () at /lib64/libglib-2.0.so.0 #5 0x00007fad33ca66d7 in start_thread () at /lib64/libc.so.6 #6 0x00007fad33d2a60c in clone3 () at /lib64/libc.so.6 Thread 2 (Thread 0x7fad234006c0 (LWP 332479) "pool-spawner"): #0 0x00007fad33d283dd in syscall () at /lib64/libc.so.6 #1 0x00007fad3496dccd in g_cond_wait () at /lib64/libglib-2.0.so.0 #2 0x00007fad348d961b in g_async_queue_pop_intern_unlocked () at /lib64/libglib-2.0.so.0 #3 0x00007fad34940a03 in g_thread_pool_spawn_thread () at /lib64/libglib-2.0.so.0 #4 0x00007fad3493f813 in g_thread_proxy () at /lib64/libglib-2.0.so.0 #5 0x00007fad33ca66d7 in start_thread () at /lib64/libc.so.6 #6 0x00007fad33d2a60c in clone3 () at /lib64/libc.so.6 Thread 1 (Thread 0x7fad31313bc0 (LWP 332475) "gimagereader-gt"): #0 0x00007fad34b20430 in gtk_css_matcher_node_has_id () at /lib64/libgtk-3.so.0 #1 0x00007fad34b356be in gtk_css_selector_tree_match_foreach.lto_priv () at /lib64/libgtk-3.so.0 #2 0x00007fad34b2b906 in gtk_css_style_provider_lookup () at /lib64/libgtk-3.so.0 #3 0x00007fad34ca03b0 in gtk_style_cascade_lookup () at /lib64/libgtk-3.so.0 #4 0x00007fad34b4995d in gtk_css_static_style_new_compute () at /lib64/libgtk-3.so.0 #5 0x00007fad34b287e6 in gtk_css_node_real_update_style () at /lib64/libgtk-3.so.0 #6 0x00007fad34b2538b in gtk_css_node_ensure_style.part () at /lib64/libgtk-3.so.0 #7 0x00007fad34b25638 in gtk_css_node_validate_internal.part () at /lib64/libgtk-3.so.0 #8 0x00007fad34b2568b in gtk_css_node_validate_internal.part () at /lib64/libgtk-3.so.0 #9 0x00007fad34b2568b in gtk_css_node_validate_internal.part () at /lib64/libgtk-3.so.0 #10 0x00007fad34b2568b in gtk_css_node_validate_internal.part () at /lib64/libgtk-3.so.0 #11 0x00007fad34b2568b in gtk_css_node_validate_internal.part () at /lib64/libgtk-3.so.0 #12 0x00007fad34b2568b in gtk_css_node_validate_internal.part () at /lib64/libgtk-3.so.0 #13 0x00007fad34b2568b in gtk_css_node_validate_internal.part () at /lib64/libgtk-3.so.0 #14 0x00007fad34b2568b in gtk_css_node_validate_internal.part () at /lib64/libgtk-3.so.0 #15 0x00007fad34b2568b in gtk_css_node_validate_internal.part () at /lib64/libgtk-3.so.0 #16 0x00007fad34b0813b in gtk_container_idle_sizer () at /lib64/libgtk-3.so.0 #17 0x00007fad3521e254 in signal_emit_valist_unlocked () at /lib64/libgobject-2.0.so.0 #18 0x00007fad3521e361 in g_signal_emit_valist () at /lib64/libgobject-2.0.so.0 #19 0x00007fad3521e423 in g_signal_emit () at /lib64/libgobject-2.0.so.0 #20 0x00007fad35b42154 in gdk_frame_clock_paint_idle () at /lib64/libgdk-3.so.0 #21 0x00007fad35b2e00d in gdk_threads_dispatch () at /lib64/libgdk-3.so.0 #22 0x00007fad349150d9 in g_timeout_dispatch () at /lib64/libglib-2.0.so.0 #23 0x00007fad3490ee8c in g_main_context_dispatch_unlocked.lto_priv () at /lib64/libglib-2.0.so.0 #24 0x00007fad34970c98 in g_main_context_iterate_unlocked.isra () at /lib64/libglib-2.0.so.0 #25 0x00007fad34910383 in g_main_context_iteration () at /lib64/libglib-2.0.so.0 #26 0x00007fad34befc5d in gtk_main_iteration_do () at /lib64/libgtk-3.so.0 #27 0x00007fad358a9111 in Gtk::Main::iteration(bool) () at /lib64/libgtkmm-3.0.so.1 #28 0x00005651e384ec04 in Utils::busyTask(std::function const&, Glib::ustring const&) () #29 0x00005651e386e1f8 in HOCRPdfExporter::run(HOCRDocument const*, std::__cxx11::basic_string, std::allocator > const&, HOCRExporter::ExporterSettings const*) () #30 0x00005651e38b1b30 in OutputEditorHOCR::exportToPDF() [clone .isra.0] () #31 0x00007fad35ada8e4 in Glib::SignalProxyNormal::slot0_void_callback(_GObject*, void*) () at /lib64/libglibmm-2.4.so.1 #32 0x00007fad351fd64a in g_closure_invoke () at /lib64/libgobject-2.0.so.0 #33 0x00007fad3522d94d in signal_emit_unlocked_R.isra.0 () at /lib64/libgobject-2.0.so.0 #34 0x00007fad3521e104 in signal_emit_valist_unlocked () at /lib64/libgobject-2.0.so.0 #35 0x00007fad3521e361 in g_signal_emit_valist () at /lib64/libgobject-2.0.so.0 #36 0x00007fad3521e423 in g_signal_emit () at /lib64/libgobject-2.0.so.0 #37 0x00007fad34d4ae6c in gtk_widget_activate () at /lib64/libgtk-3.so.0 #38 0x00007fad34c0bb56 in gtk_menu_shell_activate_item () at /lib64/libgtk-3.so.0 #39 0x00007fad34c0bf94 in gtk_menu_shell_button_release () at /lib64/libgtk-3.so.0 #40 0x00007fad3592bf09 in Gtk::Widget::on_button_release_event(_GdkEventButton*) () at /lib64/libgtkmm-3.0.so.1 #41 0x00007fad35924d81 in Gtk::Widget_Class::button_release_event_callback(_GtkWidget*, _GdkEventButton*) () at /lib64/libgtkmm-3.0.so.1 #42 0x00007fad34a888d9 in _gtk_marshal_BOOLEAN__BOXEDv () at /lib64/libgtk-3.so.0 #43 0x00007fad3521e254 in signal_emit_valist_unlocked () at /lib64/libgobject-2.0.so.0 #44 0x00007fad3521e361 in g_signal_emit_valist () at /lib64/libgobject-2.0.so.0 #45 0x00007fad3521e423 in g_signal_emit () at /lib64/libgobject-2.0.so.0 #46 0x00007fad34d5fe3c in gtk_widget_event_internal.part.0.lto_priv () at /lib64/libgtk-3.so.0 #47 0x00007fad34bf4328 in propagate_event.lto_priv () at /lib64/libgtk-3.so.0 #48 0x00007fad34bf50aa in gtk_main_do_event () at /lib64/libgtk-3.so.0 #49 0x00007fad35b34807 in _gdk_event_emit () at /lib64/libgdk-3.so.0 #50 0x00007fad35b6e3ce in gdk_event_source_dispatch () at /lib64/libgdk-3.so.0 #51 0x00007fad3490ee8c in g_main_context_dispatch_unlocked.lto_priv () at /lib64/libglib-2.0.so.0 #52 0x00007fad34970c98 in g_main_context_iterate_unlocked.isra () at /lib64/libglib-2.0.so.0 #53 0x00007fad34910383 in g_main_context_iteration () at /lib64/libglib-2.0.so.0 #54 0x00007fad3532d0fd in g_application_run () at /lib64/libgio-2.0.so.0 #55 0x00005651e37d76fa in main () ```

I'm using version 3.4.2 on Fedora 40, installed with the package manager, not flatpak