Open zhangtingyun opened 4 months ago
I don't know if my change is correct, please let me know or can you fix this bug, thanks!
page_1.pdf this is the pdf
Hi, I'm also facing the same issue while using pdfplumber which is developed base on pdfminer.six.
In my usage, the pdfminer.six version is 20221105
, pdfplumber version is 0.10.4
Even though I've tried repaired PDFs with ghostscripts, the as follow:
gswin64c -o repaired.pdf -sDEVICE=pdfwrite input.pdf
output file is repaired.pdf Reference : https://github.com/jsvine/pdfplumber/issues/425
The repaired.pdf is still out of order while extracting text.
And I tried to remove * fontsize
in descent
The result goes correct
Is this a bug or something? Thanks
I just simply called the method in pdfminer to parse a pdf, but there is a problem with the coordinates of the parsed result, which is different from what I expected, sometimes the coordinates will be high, sometimes it will be low,but pdfJs can solve this problem
I've made some modifications that fix this
Tm_mul_CTM = matrix
Th = scaling
Tfs = fontsize
_render_matrix = (Tfs * Th, 0, # 0
0, Tfs, # 0
0, rise # 1
)
Trm = mult_matrix(_render_matrix, Tm_mul_CTM)
(a, b, c, d, e, f) = Trm
w, h = x1 - x0, y1 - y0
(x0, y0) = (e, f)
(x1, y1) = (x0 + w, y0 + h)
y0, y1 = y0 + descent, y1 + descent
![image](https://github.com/pdfminer/pdfminer.six/assets/76984049/ea4b20b8-69af-4401-bc5d-47eaa8dbe806)