Open mergen3107 opened 10 months ago
7. When I finish the highlight...
OK, weird stuff! If I continue from where I left off:
8) Now that the words Advanced Composition Explorer
are selected, I go to the bottom menu and disable the Forced OCR
.
9) The highlight of the words Advanced Composition Explorer
disappears, and is replaced by a good highlight of the the whole abstract, as if it was done in with Forced OCR
turned off.
So this enabling Forced OCR
only results in showing the highlight wrong. Weird!
Here is rest of the log with steps 8 and beyond: crash_OCR_part2.log.txt
I played around the highlight again, don't remember the order.
At step 7, when only Advanced Composition Explorer
is shown, going to the Bookmarks
reveals that it is indeed the whole abstract that has been stored as the highlight. So, only the visual aspect of it is wrong.
Let me test the dictionary query...
OK, dictionary queries are affected.
So, when Forced OCR
is disabled, when I select Abstract.
I get the results for the word abstract
.
When Forced OCR
is enabled, the dictionary searches for Advancedcomposition
without a whitespace.
Log: (strangely, crash.log did not update after these actions, so I can't provide it).
OCR files are the same as from here: https://github.com/koreader/koreader/issues/7860#issuecomment-862995754 (There are also the same ones I used to write up this Wiki page: https://github.com/koreader/koreader/wiki/Dictionary-support#dictionary-lookups-in-scanned-pages)
OCR itself seems to work great.
From frontend/document/koptinterface.lua
(around L520):
function KoptInterface:getTextBoxes(doc, pageno) local text = doc:getPageTextBoxes(pageno) logger.dbg("LOGG text =", text) if text and #text > 1 and doc.configurable.forced_ocr ~= 1 then return text -- if we have no text in original page then we will reuse native word boxes -- in reflow mode and find text boxes from scratch in non-reflow mode else if doc.configurable.text_wrap == 1 then return self:getNativeTextBoxes(doc, pageno) else return self:getNativeTextBoxesFromScratch(doc, pageno) end end end
LOGG text =
shows:
...
12/28/23-21:46:35 DEBUG LOGG text = {
{
{
word = "SOLAR",
x0 = 128.75999450683594,
x1 = 166.51777648925781,
y0 = 95.568199157714844,
y1 = 110.65730285644531
} --[[table: 0x7f16dc0fb078]],
{
word = "WIND",
x0 = 168.93115234375,
x1 = 199.876953125,
y0 = 95.568199157714844,
y1 = 110.65730285644531
} --[[table: 0x7f16dc0fb598]],
{
word = "ELECTRON",
x0 = 202.65058898925781,
x1 = 263.80853271484375,
y0 = 95.568199157714844,
y1 = 110.65730285644531
} --[[table: 0x7f16dc0fbd98]],
{
word = "PROTON",
x0 = 266.221923828125,
x1 = 312.03457641601563,
y0 = 95.568199157714844,
y1 = 110.65730285644531
} --[[table: 0x7f16dc0fc430]],
...
So the culprit must be readerhighlight.lua
then.
I am trying to trace the function involved in highlighting text while forced_ocr
= 1.
1) I found with ripgrep
that forced_ocr
is meaningfully only present here:
https://github.com/koreader/koreader/blob/63329569eb8afb00a29a723f215c6c5fcc9041aa/frontend/document/koptinterface.lua#L520-L533
KoptInterface:getTextBoxes(doc, pageno)
is responsible for getting text boxes of the document. Depending on reflow
, it then goes to function getNativeTextBoxes(doc, pageno)
or getNativeTextBoxesFromScratch(doc, pageno)
.
The latter seems to work OK (tested with reflow off for now), as shown in the comment above.
2) When stuff is highlighted, I guess it starts here is the onHold
function:
https://github.com/koreader/koreader/blob/63329569eb8afb00a29a723f215c6c5fcc9041aa/frontend/apps/reader/modules/readerhighlight.lua#L1168-L1176
3) However, I cannot find the reference to KoptInterface:getTextBoxes(doc, pageno)
in that function from readerhighlight.lua
.
I can only see it in
function ReaderHighlight:onTranslateCurrentPage()
and
function ReaderHighlight:getExtendedHighlightPage()
.
Am I looking for the wrong function getTextBoxes()
?
Am I looking for the wrong function
getTextBoxes()
?
Yes, this was wrong. What I need is this: https://github.com/koreader/koreader/blob/63329569eb8afb00a29a723f215c6c5fcc9041aa/frontend/apps/reader/modules/readerhighlight.lua#L1208
Now, onto getWordFromPosition
...
I think I found the problem.
Here is the first two items in the output of getTextBoxes
. Each item is duplicated. Is this how it supposed to be?
12/29/23-01:00:41 DEBUG LOGG-5 KI.lua getTextBoxes = {
{
{
x0 = 312,
x1 = 402,
y0 = 138,
y1 = 148
} --[[table: 0x7feb9acc23c8]],
x0 = 312,
x1 = 402,
y0 = 138,
y1 = 148
} --[[table: 0x7feb9acc2380]],
{
{
x0 = 194,
x1 = 220,
y0 = 164,
y1 = 174
} --[[table: 0x7feb9ab6f008]],
x0 = 194,
x1 = 220,
y0 = 164,
y1 = 174
} --[[table: 0x7feb93f8c108]],
...
This is the output at L530 here: https://github.com/koreader/koreader/blob/63329569eb8afb00a29a723f215c6c5fcc9041aa/frontend/document/koptinterface.lua#L520-L533
Nevermind, text
from L521 is the same in both OCR on and OCR off cases, so that's not it.
Oh, wait, no, they are not the same!
Here is the no-OCR log of text
- this is when text layers from PDF itself are taken.
12/29/23-01:00:41 DEBUG LOGG-5 text = {
{
{
word = "564",
x0 = 119.16000366210938,
x1 = 135.43389892578125,
y0 = 72.549667358398438,
y1 = 87.463569641113281
} --[[table: 0x7feb99577d20]],
x0 = 119.16000366210938,
x1 = 135.43389892578125,
y0 = 72.549667358398438,
y1 = 87.463569641113281
} --[[table: 0x7feb9aad8a28]],
{
{
word = "D.",
x0 = 257.39999389648438,
x1 = 265.15200805664063,
y0 = 75.656021118164063,
y1 = 86.552017211914063
} --[[table: 0x7feb9ab65c08]],
{
word = "J.",
x0 = 267.11203002929688,
x1 = 272.20004272460938,
y0 = 75.656021118164063,
y1 = 86.552017211914063
} --[[table: 0x7feb93f78e50]],
{
word = "McCOMAS",
x0 = 274.16006469726563,
x1 = 313.12814331054688,
y0 = 75.656021118164063,
y1 = 86.552017211914063
} --[[table: 0x7feb9a95ed90]],
{
word = "ET",
x0 = 315.2801513671875,
x1 = 325.08816528320313,
y0 = 75.656021118164063,
y1 = 86.552017211914063
} --[[table: 0x7feb9aaf1680]],
...
So, in the OCR mode, word =
is missing completely (*from the tables), because it is nil (?)
I think I traced down where words
stop appearing:
https://github.com/koreader/koreader/blob/63329569eb8afb00a29a723f215c6c5fcc9041aa/frontend/document/koptinterface.lua#L674
The output of this function is the first one where there are no word =
in the table:
12/29/23-02:13:08 DEBUG LOGG-6 getNativeTextBoxesFromScratch: RETURN boxes = {
{
{
x0 = 312,
x1 = 402,
y0 = 138,
y1 = 148
} --[[table: 0x7f2328f00ed8]],
x0 = 312,
x1 = 402,
y0 = 138,
y1 = 148
} --[[table: 0x7f2328f00e90]],
{
{
x0 = 194,
x1 = 220,
y0 = 164,
y1 = 174
} --[[table: 0x7f2328f01038]],
x0 = 194,
x1 = 220,
y0 = 164,
y1 = 174
} --[[table: 0x7f2328f00ff0]],
...
Then getNativeWordBoxes
is from koreader/base/ffi/koptcontext.lua
:
function KOPTContext_mt.__index:getNativeWordBoxes(bmp, x, y, w, h)
return self:getWordBoxes(bmp, x, y, w, h, 1)
end
which uses this function above.
function KOPTContext_mt.__index:getWordBoxes(bmp, x, y, w, h, box_type)
local boxa = ffi.new("BOXA[1]")
local nai = ffi.new("NUMA[1]")
local counter_l = ffi.new("int[1]")
local nr_word, current_line
local counter_w, counter_cw
local l_x0, l_y0, l_x1, l_y1
if box_type == 0 then
k2pdfopt.k2pdfopt_get_reflowed_word_boxes(self,
bmp == "src" and self.src or self.dst, x, y, w, h)
boxa = self.rboxa
nai = self.rnai
elseif box_type == 1 then
k2pdfopt.k2pdfopt_get_native_word_boxes(self,
bmp == "src" and self.src or self.dst, x, y, w, h)
boxa = self.nboxa
nai = self.nnai
end
if boxa == nil or nai == nil then return end
-- get number of lines in this area
nr_word = leptonica.boxaGetCount(boxa)
assert(nr_word == leptonica.numaGetCount(nai))
local boxes = {}
counter_w = 0
while counter_w < nr_word do
leptonica.numaGetIValue(nai, counter_w, counter_l)
current_line = counter_l[0]
-- sub-table that contains words in a line
local lbox = {}
boxes[counter_l[0]+1] = lbox
counter_cw = 0
l_x0, l_y0, l_x1, l_y1 = 9999, 9999, 0, 0
while current_line == counter_l[0] and counter_w < nr_word do
local box = leptonica.boxaGetBox(boxa, counter_w, C.L_CLONE)
-- update line box
l_x0 = box.x < l_x0 and box.x or l_x0
l_y0 = box.y < l_y0 and box.y or l_y0
l_x1 = box.x + box.w > l_x1 and box.x + box.w or l_x1
l_y1 = box.y + box.h > l_y1 and box.y + box.h or l_y1
-- box for a single word
lbox[counter_cw+1] = {
x0 = box.x, y0 = box.y,
x1 = box.x + box.w,
y1 = box.y + box.h,
}
counter_w, counter_cw = counter_w + 1, counter_cw + 1
if counter_w < nr_word then
leptonica.numaGetIValue(nai, counter_w, counter_l)
end
end
if current_line ~= counter_l[0] then counter_w = counter_w - 1 end
-- box for a whole line
lbox.x0, lbox.y0, lbox.x1, lbox.y1 = l_x0, l_y0, l_x1, l_y1
counter_w = counter_w + 1
end
return boxes, nr_word
end
I think the culprit is somewhere here :D
Why doesn't this OCR function isn't used anywhere? https://github.com/koreader/koreader/blob/63329569eb8afb00a29a723f215c6c5fcc9041aa/frontend/document/koptinterface.lua#L721-L730
I can only see it here, but this is not onHold
function, but lookup
:
https://github.com/koreader/koreader/blob/63329569eb8afb00a29a723f215c6c5fcc9041aa/frontend/apps/reader/modules/readerhighlight.lua#L1437
Ah, OK. lookup(
is called from function ReaderHighlight:onHoldRelease()
.
However, the coordinates of the boxes from the OP are wrong already at onHold
.
(Not following you on your investigation, all this code is quite obscure to many of us...) Just mentionning #3688 and that it may be related to DPI (joy!).
This function is really weird koptcontext.lua
:
function KOPTContext_mt.__index:getWordBoxes(bmp, x, y, w, h, box_type) local boxa = ffi.new("BOXA[1]") local nai = ffi.new("NUMA[1]") local counter_l = ffi.new("int[1]") local nr_word, current_line local counter_w, counter_cw local l_x0, l_y0, l_x1, l_y1 if box_type == 0 then k2pdfopt.k2pdfopt_get_reflowed_word_boxes(self, bmp == "src" and self.src or self.dst, x, y, w, h) boxa = self.rboxa nai = self.rnai elseif box_type == 1 then k2pdfopt.k2pdfopt_get_native_word_boxes(self, bmp == "src" and self.src or self.dst, x, y, w, h) boxa = self.nboxa nai = self.nnai logger.dbg("LOGGBASE boxa =", boxa) logger.dbg("LOGGBASE nai =", nai) end if boxa == nil or nai == nil then return end -- get number of lines in this area nr_word = leptonica.boxaGetCount(boxa) assert(nr_word == leptonica.numaGetCount(nai)) local boxes = {} counter_w = 0 while counter_w < nr_word do leptonica.numaGetIValue(nai, counter_w, counter_l) current_line = counter_l[0] -- sub-table that contains words in a line local lbox = {} boxes[counter_l[0]+1] = lbox counter_cw = 0 l_x0, l_y0, l_x1, l_y1 = 9999, 9999, 0, 0 while current_line == counter_l[0] and counter_w < nr_word do local box = leptonica.boxaGetBox(boxa, counter_w, C.L_CLONE) -- update line box l_x0 = box.x < l_x0 and box.x or l_x0 l_y0 = box.y < l_y0 and box.y or l_y0 l_x1 = box.x + box.w > l_x1 and box.x + box.w or l_x1 l_y1 = box.y + box.h > l_y1 and box.y + box.h or l_y1 -- box for a single word lbox[counter_cw+1] = { x0 = box.x, y0 = box.y, x1 = box.x + box.w, y1 = box.y + box.h, } counter_w, counter_cw = counter_w + 1, counter_cw + 1 if counter_w < nr_word then leptonica.numaGetIValue(nai, counter_w, counter_l) end end if current_line ~= counter_l[0] then counter_w = counter_w - 1 end -- box for a whole line lbox.x0, lbox.y0, lbox.x1, lbox.y1 = l_x0, l_y0, l_x1, l_y1 counter_w = counter_w + 1 end return boxes, nr_word end
It seems that boxes
stay empty, because all involved variables are inside the while
loop, which makes those vars empty at each cycle.
(Not following you on your investigation, all this code is quite obscure to many of us...) Just mentionning #3688 and that it may be related to DPI (joy!).
@poire-z
Thanks!
However, I really think the problem stems from the fact that word
variables are not present when forced_ocr
is on.
Coordinates seem sane. See example log from emulator:
12/29/23-02:43:53 DEBUG hold position in page {
page = 2,
rotation = 0,
x = 183.32065597922661,
y = 362.5031481665971,
zoom = 1.2082653687705533
} --[[table: 0x7f1d7c14fa18]]
12/29/23-02:43:53 DEBUG LOGG-5 text = {
{
{
word = "564",
x0 = 119.16000366210938,
x1 = 135.43389892578125,
y0 = 72.549667358398438,
y1 = 87.463569641113281
} --[[table: 0x7f1da10e2760]],
x0 = 119.16000366210938,
x1 = 135.43389892578125,
y0 = 72.549667358398438,
y1 = 87.463569641113281
} --[[table: 0x7f1da1382ad0]],
{
{
word = "D.",
x0 = 257.39999389648438,
x1 = 265.15200805664063,
y0 = 75.656021118164063,
y1 = 86.552017211914063
} --[[table: 0x7f1da10d56d8]],
This function is really weird
koptcontext.lua
Alright, I had some sleep :D this function is fine:
12/29/23-12:31:26 DEBUG LOGGBASE boxes END = {
{
{
x0 = 120,
x1 = 405,
y0 = 177,
y1 = 187
} --[[table: 0x7f152ef18750]],
x0 = 120,
x1 = 405,
y0 = 177,
y1 = 187
} --[[table: 0x7f152ef17830]],
{
{
x0 = 356,
x1 = 405,
y0 = 229,
y1 = 239
} --[[table: 0x7f152cd3de50]],
x0 = 356,
x1 = 405,
y0 = 229,
y1 = 239
} --[[table: 0x7f152cd3c578]]
} --[[table: 0x7f152ef17780]]
It keeps populating the boxes
table. However, it doesn't have word
values in it, but the other function is waiting for them:
https://github.com/koreader/koreader/blob/63329569eb8afb00a29a723f215c6c5fcc9041aa/frontend/document/koptinterface.lua#L901-L917
Here boxes
are expected to have word
, but the come from that function without them.
I'll write up the full chain now.
1) readerhighlight.lua
, onHold
: calls document-specific getWordFromPosition
:
https://github.com/koreader/koreader/blob/63329569eb8afb00a29a723f215c6c5fcc9041aa/frontend/apps/reader/modules/readerhighlight.lua#L1208
Input: document name/path doc
, highlight holding position pos
.
Expected output: table with five values: word
, pos0
, pos1
, sbox
, pbox
. The latter two are box on screen
and box on page
, respectively. sbox
is for reflowed content, so in normal view sbox
is nil.
2) The function getWordFromPosition
from koptinterface.lua
:
https://github.com/koreader/koreader/blob/63329569eb8afb00a29a723f215c6c5fcc9041aa/frontend/document/koptinterface.lua#L1039-L1049
First, it needs local text_boxes = self:getTextBoxes(doc, pos.page)
from itself:
https://github.com/koreader/koreader/blob/63329569eb8afb00a29a723f215c6c5fcc9041aa/frontend/document/koptinterface.lua#L520-L533
Here we follow the last return self:getNativeTextBoxesFromScratch(doc, pageno)
because forced_ocr
=1 and there is no reflow.
https://github.com/koreader/koreader/blob/63329569eb8afb00a29a723f215c6c5fcc9041aa/frontend/document/koptinterface.lua#L660-L684
Here we need local boxes, nr_word = kc:getNativeWordBoxes("src", 0, 0, page_size.w, page_size.h)
, which is in koreader-base/ffi/koptcontext.lua
:
function KOPTContext_mt.__index:getNativeWordBoxes(bmp, x, y, w, h)
return self:getWordBoxes(bmp, x, y, w, h, 1)
end
which is then
function KOPTContext_mt.__index:getWordBoxes(bmp, x, y, w, h, box_type)
local boxa = ffi.new("BOXA[1]")
local nai = ffi.new("NUMA[1]")
local counter_l = ffi.new("int[1]")
local nr_word, current_line
local counter_w, counter_cw
local l_x0, l_y0, l_x1, l_y1
if box_type == 0 then
k2pdfopt.k2pdfopt_get_reflowed_word_boxes(self,
bmp == "src" and self.src or self.dst, x, y, w, h)
boxa = self.rboxa
nai = self.rnai
elseif box_type == 1 then
k2pdfopt.k2pdfopt_get_native_word_boxes(self,
bmp == "src" and self.src or self.dst, x, y, w, h)
boxa = self.nboxa
nai = self.nnai
end
if boxa == nil or nai == nil then return end
-- get number of lines in this area
nr_word = leptonica.boxaGetCount(boxa)
assert(nr_word == leptonica.numaGetCount(nai))
local boxes = {}
counter_w = 0
while counter_w < nr_word do
leptonica.numaGetIValue(nai, counter_w, counter_l)
current_line = counter_l[0]
-- sub-table that contains words in a line
local lbox = {}
boxes[counter_l[0]+1] = lbox
counter_cw = 0
l_x0, l_y0, l_x1, l_y1 = 9999, 9999, 0, 0
while current_line == counter_l[0] and counter_w < nr_word do
local box = leptonica.boxaGetBox(boxa, counter_w, C.L_CLONE)
-- update line box
l_x0 = box.x < l_x0 and box.x or l_x0
l_y0 = box.y < l_y0 and box.y or l_y0
l_x1 = box.x + box.w > l_x1 and box.x + box.w or l_x1
l_y1 = box.y + box.h > l_y1 and box.y + box.h or l_y1
-- box for a single word
lbox[counter_cw+1] = {
x0 = box.x, y0 = box.y,
x1 = box.x + box.w,
y1 = box.y + box.h,
}
counter_w, counter_cw = counter_w + 1, counter_cw + 1
if counter_w < nr_word then
leptonica.numaGetIValue(nai, counter_w, counter_l)
end
end
if current_line ~= counter_l[0] then counter_w = counter_w - 1 end
-- box for a whole line
lbox.x0, lbox.y0, lbox.x1, lbox.y1 = l_x0, l_y0, l_x1, l_y1
counter_w = counter_w + 1
end
return boxes, nr_word
end
The output of boxes
here is the nested tables like:
12/29/23-12:31:26 DEBUG LOGGBASE boxes END = {
{
{
x0 = 120,
x1 = 405,
y0 = 177,
y1 = 187
} --[[table: 0x7f152ef18750]],
x0 = 120,
x1 = 405,
y0 = 177,
y1 = 187
} --[[table: 0x7f152ef17830]],
{
{
x0 = 356,
x1 = 405,
y0 = 229,
y1 = 239
} --[[table: 0x7f152cd3de50]],
x0 = 356,
x1 = 405,
y0 = 229,
y1 = 239
} --[[table: 0x7f152cd3c578]]
...
3) Now that the boxes
are defined, the function getWordFromPosition
from (2) calls getWordFromNativePosition
:
https://github.com/koreader/koreader/blob/63329569eb8afb00a29a723f215c6c5fcc9041aa/frontend/document/koptinterface.lua#L1089-L1098
This in turn needs getWordFromBoxes
function:
https://github.com/koreader/koreader/blob/63329569eb8afb00a29a723f215c6c5fcc9041aa/frontend/document/koptinterface.lua#L901-L917
So, here it expects that boxes
have words
as well, but they don't:
local wb = boxes[i][j]
...
return {
word = wb.word,
box = box,
}
Where should word
s come from in this chain?
(Still not wanting to look at all this code :)). But I remember (crengine only, dunno) that there are legitimate cases where stuff like getWord fails (may be returning no "word"), and we fallback to getText which does the same thing (but differently, usually better) when the start and end pos are identical. So, be sure there's not something after this getWord stuff that makes it work using getText.
OK, but this happens not during onHold
, but at highlightFromHoldPos
:
https://github.com/koreader/koreader/blob/63329569eb8afb00a29a723f215c6c5fcc9041aa/frontend/apps/reader/modules/readerhighlight.lua#L1672-L1686
and eventually getTextFromBoxes
also assumes that boxes
have words
:
https://github.com/koreader/koreader/blob/63329569eb8afb00a29a723f215c6c5fcc9041aa/frontend/document/koptinterface.lua#L941
From linked "PDF is too dark" issue:
By the way, I tried selecting words in the CBZ version with and without Forced OCR on, and ... it works flawless regardless O_O So my old issue with wrong coordinates when Forced OCR is on refers only to PDFs...
Issue
Enabling Forced OCR results in very wrong coordinates when long tapping to highlight text. Supporting my observations in https://github.com/koreader/koreader/issues/8068#issuecomment-1868435205
Steps to reproduce
Attached are an example file and a crash log with these steps: 1) Reopen KOReader with this file as last document, on page 1. 2) Tap and hold the whole Abstract, from the word
Abstract.
topurposes.
in the very end. 3) It selected OK, selectHighlight
, then tap it again and delete. 4) Go to bottom menu, enableForced OCR
, tap outside to close bottom menu. 5) Try to highlight the same words in the Abstract. 6) Now it starts the highlight way lower, at the wordAdvanced
- this is the first sentence in the1. Introduction
section below. 7) When I finish the highlight, it ends up highlightingAdvanced Composition
, then when I hitHighlight
, it ultimately highlightsAdvanced Composition Explorer
.Highlights in the crash log are done by the SCribe pen, but it is same results with the finger as well. I use pen to be more precise when selecting words.
What other logs can I provide?
crash.log
and the file.exampleFile_OCR_wrong_coordinates.log.zip
crash_OCR_wrong_coordinates.log.txt