manisandro / gImageReader

A Gtk/Qt front-end to tesseract-ocr.
GNU General Public License v3.0
1.64k stars 194 forks source link

hOCR Editor improvements #438

Open manisandro opened 4 years ago

manisandro commented 4 years ago
SantosSi commented 4 years ago

Just curious: what's the advantage of having the other issues closed and collecting them in one?

manisandro commented 4 years ago

Discussions in the other issues at times diverged from the original title, this is just to summarize the open issues at a glance, without having to re-read all the issues to pick out the various things.

zohozer commented 4 years ago

"Better side-by-side proofreading"

I am very interested on what this means? I need a tool to do quick corrections, and I need a method to see both the original picture and the OCR-es text side by side and to have a reference into the picture where it is the cursor into the editable text. Otherwise it is very difficult to make a correction to the generated text.

manisandro commented 4 years ago

@zohozer The idea is to have an extra area below the canvas where the current line, the one before and the one after are displayed, you can edit lines word-wise (and skip to the next/prev word with tab/shift+tab), and the corresponding word is highlighted in the canvas.

zohozer commented 4 years ago

I think that the fastest way to make a correction is to use the left-right arrow keys to navigate to every single letter and to stop and make corrections where needed. And yes, this is very good, to have the "corresponding word is highlighted in the canvas".

If you can have a look at Abby FineReader for Windows you will find the most intuitive OCR workflow I ever seen. I want to switch to Linux and I can't find a replacement for FineReader. For this reason I am following this project, hopefully will get a little bit more polished and usable. Thank you for your hard work.

manisandro commented 4 years ago

For the first iteration, I'll keep the workflow word-wise, following the hOCR document structure. Having it character-wise would mean automatically adjusting word boxes or creating/dropping word items depending on mutations to word boundaries as you type, which requires some more thought on how to robustly achieve.

raindropsfromsky commented 4 years ago

Instead of a panel, the preview should be displayed in the same space that shows the hOCR tree currently. That will allow 50-50 splitting of the screen: The left half shows the original, and the right half shows the OCRed output. (The right-side panel can have two tabs: hOCR tree and Preview).

Whatever operation we do in one half should reflect in the other. For example, both panels should have the same zoom and same panning. If we click on one word in any panel, the corresponding word should be highlighted in the other. If we select a block (region) in one half, it should also be selected in the other.

Sincere apologies if this is already discussed.

manisandro commented 4 years ago

I've landed a new widget for convenient proof reading (currently Qt interface only), just click on a line or a word in the output tree to activate it.

image

zohozer commented 4 years ago

Oh, yeah! Thank you. This is a much better way to correct the OCR-ed text result. I need a build to test how it is working.

zohozer commented 4 years ago

I just downloaded the latest build and I do like this new widget a lot. However may I can suggest some further improvements to the actual workflow:

  1. An ability to continuously move the cursor between different fields using only the Left/Right cursor keys instead of using the TAB key to move between fields. Similar to the Up/Down arrow keys are moving the cursor on the upper/lower paragraph lines. So, when the cursor gets to the end of a word, if I do continue to keep the Right cursor key pressed, to jump automatically to the beginning of the next word, without the need to press the Tab key to do this action.

  2. Possibility so select multiple words at once using the mouse cursor. If I press the Left mouse button and try to select multiple words, it is not working right now. Will be useful to be able to select multiple words at once and apply different formatting options to all of them in one go, like apply Italic or Bold to multiple words at the same time.

  3. Up/Down arrow keys to go to the beginning of the new word, instead of selecting the entire word how it is right now.

  4. An INSERT shortcut key. I do not have a keyboard with an INSERT key, and I need an Insert shortcut. In SublimeText I can use the CMD+Alt+o (I am using MacOS) to switch the cursor between the normal mode and the text Insert mode.

raindropsfromsky commented 4 years ago

This is a nice feature!

I took a screenshot of the email from @zohozer and tried the latest feature on it. image

Bugs:

  1. As the screenshot shows, the gImageReader shows the detected text in a very tiny font. I tried to use the CTRL+mousewheel roll control, but that works on the entire display, which includes the original image and the detected text. There should be some intuitive control to zoom in/out the detected text independently. For example, let me use SHFT+CTRL+mousewheel roll to change the zoom in the detected text independently. Alternatively, the detected text should have the same font size as the original text.
  2. If I click on any word on the original text, I expect the word to be highlighted. The gImageReader does that, but it also moves the entire paragraph by a large distance unpredictably. Sometimes the highlighted word lands outside the frame! (Note: at this time, my zoom was set very high so that I can read the detected text. Only part of the original image showed on the screen at one time. So if your zoom is set to show the entire image, you will not see this defect.)
raindropsfromsky commented 4 years ago

I am using the latest version.

Some of the text is not detected properly.

image

The blue highlighted area is not detected. If I click within it, there is no response.

If I click outside it, gImageReader shows me the detected text as labels. image

manisandro commented 4 years ago

@zohozer

  1. An ability to continuously move the cursor between different fields using only the Left/Right cursor keys instead of using the TAB key to move between fields. [...]

This could be done, I'll need to check with the client.

  1. Possibility so select multiple words at once using the mouse cursor. [...]

This is a bit harder.

  1. Up/Down arrow keys to go to the beginning of the new word, instead of selecting the entire word how it is right now.

This could be done, I'll need to check with the client.

  1. An INSERT shortcut key. I do not have a keyboard with an INSERT key, and I need an Insert shortcut.

Can't you define such a key mapping at OS level? Would be a better place to do it rather than implement a shortcut at application level.

@raindropsfromsky

  1. Can you share the image + hOCR file with which you can reproduce the small font issue?
  2. gImageReader will realign the image in such way that the line containing the selected word is approximately in the middle of the screen, to ensure the proof reading widget is visible. Open to suggestions for a better logic.
  3. What does the hOCR tree structure show? In case, here also please share image + hOCR file.
zohozer commented 4 years ago

Can't you define such a key mapping at OS level? Would be a better place to do it rather than implement a shortcut at application level.

Unfortunately not. And the only software I found on Mac to support insert (overtype) mode is the SublimeText using the CMD+Alt+o shortcut. As my keyboards are missing the INSERT key, it is not possible to have any workarounds for this limitation, unfortunately. I spent a lot of time searching for other options and none of them are working.

raindropsfromsky commented 4 years ago

I am attaching the image that can be used to reproduce the problem. But I think this problem has nothing to do with the image itself...

All the lines from top till middle produce tiny detected text.

Here, I selected a word from the second line, which produced tiny text: image

But the very next line produces a normal font of detected text. image

Here's the hOCR file and image to be OCRed: Trial.zip

manisandro commented 4 years ago

It's because the font size detection logic fails miserably with the I, which is not terribly surprising... I'll need to improve that.

manisandro commented 4 years ago

@raindropsfromsky https://github.com/manisandro/gImageReader/commit/dce0871d0de6202246e323b6e95199c27db7849e should address this

raindropsfromsky commented 4 years ago

If the comparacope feature is implemented, all this hard work is not necessary, as the superimposed text can be directly compared with the original. Much faster and more accurate!

zohozer commented 4 years ago

comparacope

What's that?

manisandro commented 4 years ago

If the comparacope feature is implemented, all this hard work is not necessary, as the superimposed text can be directly compared with the original. Much faster and more accurate!

That may be the workflow you feel is best, other people have other requirements.

raindropsfromsky commented 4 years ago

comparacope

What's that?

Issue #449

raindropsfromsky commented 4 years ago

If the comparacope feature is implemented, all this hard work is not necessary, as the superimposed text can be directly compared with the original. Much faster and more accurate!

That may be the workflow you feel is best, other people have other requirements. Very true.

BTW there is the problem I described here also...

manisandro commented 4 years ago

BTW there is the problem I described here also...

See above:

What does the hOCR tree structure show? In case, here also please share image + hOCR file.

raindropsfromsky commented 4 years ago

@manisandro Nowadays multiple releases are coming up almost on a daily basis. That's great!

I'd like to support this by testing each release rapidly and reporting issues (if any). But there is a problem: I cannot guess what's the change in each release (comparing code is tedious).

Can you please include a release note with each release (a simple text file in the "Assets")?

Thanks in advance!

manisandro commented 4 years ago

@raindropsfromsky Those are automated builds, not stable releases. You should pick most of the news by looking at the git commit log, usually the messages should be indicative enough.