Open slbayer opened 1 year ago
E.g., in release 20221105, in converter.py
, line 947, change
"<div class='ocr_page' id='%s' title='%s'>\n"
to
"<div class='ocr_page' id='page_%s' title='%s'>\n"
and at line 962 - 3, change
"<div class='ocr_block' id='%d' title='%s'>\n"
% (item.index, self.bbox_repr(item.bbox))
to
"<div class='ocr_block' id='block_%s_%d' title='%s'>\n"
% (ltpage.pageid, item.index, self.bbox_repr(item.bbox))
Bug report
Thanks for finding the bug! To help us fix it, please make sure that you include the following information:
I'm loving the new hOCR renderer for extracted text output. One problem I'm observing is that the HTML
id
elements are not unique. Theid
s are unique amongocr_page
s, and within each page amongocr_block
s, but that's not how HTMLid
s work - they should be unique within the file. I'd recommend something like<div class='ocr_page' id='page_2' ...>
and<div class='ocr_block' id='block_2_1'...>
, where the first integer is the page number and the second is the number of the block within the page.