Open terrafrost opened 8 months ago
So I used qpdf's QDF mode (qpdf test.pdf --qdf test.qdf
) to further dig into this and I guess the issue is that when there are dots the dots are treated as parent objects.
%% Object stream: object 7, index 2; original object ID: 24
<<
/DA (/Helv 12 Tf 0 g)
/F 4
/FT /Tx
/MK <<
>>
/P 21 0 R
/Parent 17 0 R
/Rect [
190.784
658.903
340.784
680.903
]
/Subtype /Widget
/T (yyy)
/Type /Annot
>>
So if you look at the /T
tag in isolation you get yyy
. The xxx
is due to the /Parent 17 0 R
bit:
%% Object stream: object 17, index 0; original object ID: 10
<<
/Kids [
7 0 R
]
/T (xxx)
>>
So I guess what pdf2json needs to do is to recursively go back and find each parent until there is no parent and it needs to prepend each parent to the /T
tag with dots separating each part.
So I have a PDF with just one field on it - a field named "xxx.yyy". When I run pdf2json 3.0.5 on the PDF I'm told that the only field on that PDF is "yyy".
test.pdf demonstrates the problem.
Here's what Adobe Acrobat Pro 2020 shows:
pdftk 2.02 also finds "xxx.yyy" when I run
pdftk test.pdf dump_data_fields
:Unfortunately, pdftk doesn't return the coordinates whereas pdf2json does.
According to
qpdf test.pdf --json
the field's alternativename, fullname and mappingname are "xxx.yyy" whereas the partialname is "yyy" so maybe that's the issue?