Open ddotta opened 4 months ago
I managed to do what I wanted with pdftools::pdf_text()
and some complications.
It would be very useful if this could be implemented directly in extract_tables()
hi @ddotta thanks for reporting this how did you manage to do this?
@pachadotdev Here's a solution - not very optimized but does what I want https://gist.github.com/ddotta/8e828145355bb87e78d83191b747b2e0
Prework
Question
I'm trying to extract data from a pdf document that contains tables with checkboxes (see my reproducible example below).
The
extract_tables()
function works well and manages to identify the tables in the pdf document, but I only getNA
for all the checkboxes.Is there any way of identifying which boxes are checked? Many thanks for your help ! 🙏
Reproducible example
Here's my pdf test.pdf
And my code :
What I get :