jimmejardine / qiqqa-open-source

The open-sourced version of the award-winning Qiqqa research management tool for Windows
GNU General Public License v3.0
369 stars 60 forks source link

BUG: When the OCR of a file is not complete, Qiqqa does not give control to the user #166

Open raindropsfromsky opened 4 years ago

raindropsfromsky commented 4 years ago

I have a pdf file that has machine-readable (searchable) text (as opposed to scanned images).

I just noticed a subtle sign in the left-hand-top corner of one page. When I hovered my mouse on it, I learned that- image

Well, I would NOT like to wait: I would like to know where else this problem exists, and how to solve it as quickly as possible. (The file has large fonts with high contrast, so even if Qiqqa wants to OCR a searchable text, I don't mind.)

But Qiqqa does not give me a chance. I expect-

  1. The status line should show a statistics ("x out of y pages need OCR")
  2. A summary figure of how many pages have the same issue, followed by a clickable page-list.
  3. A control to OCR all the pending pages at once
  4. A control on each "defective" page to re-OCR the page.

The file in which I noticed the issue is attached: SC judgement dtd 17-03-2020, on EIA for PRR, Bangalore.pdf

GerHobbelt commented 4 years ago

Related to #35 et al.

GerHobbelt commented 4 years ago

See also: https://github.com/jimmejardine/qiqqa-open-source/issues/165#issuecomment-603402113-permalink

But Qiqqa does not give me a chance. I expect-

  1. The status line should show a statistics ("x out of y pages need OCR")
  2. A summary figure of how many pages have the same issue, followed by a clickable page-list.
  3. A control to OCR all the pending pages at once
  4. A control on each "defective" page to re-OCR the page.

My current thought on this is:

Qiqqa SHOULD indeed have a way to better inform its users what's going on under the hood. Given the plethora of possibilities where and how things can go wrong, using an already severely abused status line is not going to cut it. Given https://github.com/jimmejardine/qiqqa-open-source/issues/138, plus my own experience and now this, this starts to sound more like a 'IDE', where one 'develops' his/hers complex document collection. (Think: status/log panel where status info is reported. When needed, detailed info can be provided to the user there.)

Though at first glance I like the wish list, it certainly implies a quite significant development effort as this app will then turn into a PDF/document OCR processing & management UI. I do recall my struggles with those OmniPage products in the past way too vividly to believe I'ld be able to produce something like that, and working, going on a one-man show, which Qiqqa currently is.

Thus there's either funding and a team or some hard choices to make. Oh wait. 🤡 There's always some hard choices there as the funders will want... 😉

Anyway, YMMV.

raindropsfromsky commented 4 years ago

Yes, where several parameters are changing, and all have to be reported simultaneously to the user, it would need a console panel, rather than a single status line, which cannot handle multiple values.

I have seen status lines that show multiple parameters and their values. But that's an ugly design.

raindropsfromsky commented 4 years ago

Rather than shelving this change, why not work on a simpler design?

In my suggested scenario, we don't need a console or status line; and that is because the status is not required to be displayed all the time. Just make it on-demand.

Here's the workflow:

When there is a problem with a file, show the image icon in the GUI (not on the pending pages alone, because the user cannot use his knowledge about which page is done and which is pending.

So when the user hovers (clicks?) on that icon, let Qiqqa show him a pop-up dialog, with the statistics for the whole file (not just for that page):

Pages to be texified: n Pages to be OCRed: m Time to completion: hh:mm

Below that, a single button to let the user decide whether he wants to make the thread a foreground process. That means the process will consume a major part of system resources but will finish much faster.

But let the user also decide to interrupt and make the process a background process with low priority.

So the single button can toggle between Turbo mode and Background mode.

GerHobbelt commented 4 years ago

In terms of effort, the latter might seem quicker but there's quite a few places where Qiqqa SHOULD have this kind of augmentation in user feedback: designing and coding all those little panels is a hassle for me (I don't particularly like working with XAML; Qiqqa is a dire necessity for me, so I picked up C# programming again), while the larger effort to come up with a console-like pane design is much more at the start but, in my mind at least, will solve a lot of these feedback issues and make it easier for me to add more as they come along.

Anyway, filing this in the brain. Not doing this overnight anyway. 😉

Atif-Anwer commented 4 years ago

Agreed with the comments above; there should be a clear (with popup) status bar that can give information about the current PDF as well as the processes that are being run in queue (right now its just a linear status bar overlaying new tasks on top of each other). Can incorporate in the design easily too. I need to add a UI task board somewhere... or maybe use the exisitng Github Projects/Kanban board for the UI feature list only.