Open Noctambul opened 4 years ago
Yes, and this could help for accessible PDFs.
I believe pdf-reader will provide access to the tagged data, but it's pretty low level. For example, the high-ish level Page#text
method ignore tags, but the low-level Page#walk_contents
method should generate callbacks for tags.
Unfortunately I haven't worked with tagged PDFs myself, so I'm not super familiar with how to extract the data.
Thank you for your answer and for the details. We will explore your suggestion with attention :) .
Hi,
I'm working with some tagged PDF and I must extract array from them. This arrays are tagged and I think it's the only way to parse them properly. I mean the rows have different cell size and the arrays could be on different pages.
So I'm wondering if this PDF-Reader API is able to manage this tagged PDF ?
Thank you for your attention.