jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.48k stars 658 forks source link

char need an attr linewidth #1137

Open xuehuiareafred opened 4 months ago

xuehuiareafred commented 4 months ago

image add code: attr["linewidth"] = gs.linewidth because some char use some font but add bold, maybe use linewidth can distinguish that.

thank you!

jsvine commented 4 months ago

It is my understanding that line width applies only to paths, not to text. See, e.g., from the PDF reference:

Screenshot

Do you (or anyone else reading this) have a different understanding?

xuehuiareafred commented 4 months ago

Thanks for your reply! In my opinion "The line width parameter specifies the thickness of the line used to stroke a path" has two condition。one is the line's thickness, the other is the char's thickness. because pdfminer.six char has the attr line width, and bold char without bold font has bigger line width than common char。

xuehuiareafred commented 4 months ago

line_width.pdf my test result: <PDFGraphicState: linewidth=0.12, linecap=2, linejoin=2, miterlimit=2, dash=None, intent=None, flatness=None, stroking color=(0, 0, 0), non stroking color=(0, 0, 0)> 第 NERFFC+HYShuSongErKW 10.449999999999932 <PDFGraphicState: linewidth=0.12, linecap=2, linejoin=2, miterlimit=2, dash=None, intent=None, flatness=None, stroking color=(0, 0, 0), non stroking color=(0, 0, 0)> 一 NERFFC+HYShuSongErKW 10.449999999999932 <PDFGraphicState: linewidth=0.12, linecap=2, linejoin=2, miterlimit=2, dash=None, intent=None, flatness=None, stroking color=(0, 0, 0), non stroking color=(0, 0, 0)> 章 NERFFC+HYShuSongErKW 10.449999999999932 <PDFGraphicState: linewidth=0.29887, linecap=2, linejoin=2, miterlimit=2, dash=None, intent=None, flatness=None, stroking color=(0, 0, 0), non stroking color=(0, 0, 0)> 第 NERFFC+HYShuSongErKW 10.449999999999932 <PDFGraphicState: linewidth=0.29887, linecap=2, linejoin=2, miterlimit=2, dash=None, intent=None, flatness=None, stroking color=(0, 0, 0), non stroking color=(0, 0, 0)> 一 NERFFC+HYShuSongErKW 10.449999999999932 <PDFGraphicState: linewidth=0.29887, linecap=2, linejoin=2, miterlimit=2, dash=None, intent=None, flatness=None, stroking color=(0, 0, 0), non stroking color=(0, 0, 0)> 章

jsvine commented 4 months ago

Thank you for this helpful example. I'll investigate further.