Closed Flint-company closed 2 weeks ago
Please provide a reproducing example. So far your post leads to nothing actionable.
Please provide a reproducing example. So far your post leads to nothing actionable.
Hard to do since it's a resume of an existing person and personal data... You have a way to workaround this to provide the example ?
Your PDF obviously has a problem which we should intercept and handle in a better way. So, no: we need a reproducer to confirm that we guessed the right cause. But you can use my private email for the submission so it won't be exposed to the public. Otherwise this post will never become a bug report ...
Gesendet von Outlook für Androidhttps://aka.ms/AAb9ysg
From: Flint @.> Sent: Monday, June 10, 2024 3:03:12 AM To: pymupdf/PyMuPDF @.> Cc: Jorj X. McKie @.>; Comment @.> Subject: Re: [pymupdf/PyMuPDF] page.links return all links with same xref, is it something possible ?? (Issue #3563)
Please provide a reproducing example. So far your post leads to nothing actionable.
Hard to do since it's a resume of an existing person and personal data... You have a way to workaround this to provide the example ?
— Reply to this email directly, view it on GitHubhttps://github.com/pymupdf/PyMuPDF/issues/3563#issuecomment-2157494257, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB7IDIUIV7R7E3PED7QX3V3ZGVFTBAVCNFSM6AAAAABJBAXOWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJXGQ4TIMRVG4. You are receiving this because you commented.Message ID: @.***>
The example PDF shared with me violates the specifications for links / annotations:
Instead of giving indirect references as it should be, it provides all the links dirctly in the /Annots
array.
IAW it should look like /Annots [4711 0 R 4712 0 R ...]
. Instead we find:
/Annots [ <<
/Type /Annot
/Subtype /Link
/Rect [ 248.31678 596.1143 279.57893 605.7143 ]
/Border [ 0 0 0 ]
/A <<
/Type /Action
/S /URI
/URI (https://alexialabbe.fr/#projects)
>>
>> <<
/Type /Annot
/Subtype /Link
/Rect [ 238.40349 71.17554 260.10389 80.775539 ]
/Border [ 0 0 0 ]
/A <<
/Type /Action
/S /URI
/URI (https://blog.codein.fr/guide-rgpd-les-pratiques-essentielles-pour-assurer-la-conformite-de-votre-site-web)
>>
>>
... ]
So pymupdf does recognize the links, but cannot assign an xref to them (xref=0 consequently).
You cannot update / delete links in PyMuPDF using the normal API (delete_link etc.) in such a situation - no way.
But you can edit the page's object definition source using low-level API and kill everything: for this you could delete the whole /Annots
array.
This will remove everything (!!!): links, annotations and fields that may be on the page.
doc.xref_set_key(5, "Annots", "null")
print(doc.xref_object(5)) # 5 = page xref
<<
/Type /Page
/Parent 1 0 R
/MediaBox [ 0 0 540 780 ]
/Contents 134 0 R
/Resources <<
/ExtGState <<
/Alpha0 10 0 R
/Alpha1 11 0 R
>>
/Font <<
/Font4 14 0 R
/Font11 21 0 R
/Font12 22 0 R
/Font5 15 0 R
>>
>>
/Annots null
/Group <<
/S /Transparency
/CS /DeviceRGB
>>
>>
All links are gone!
BTW the example page looks exactly the same, but all hot areas are gone.
Also, the file size (when saving via ez_save()
) goes down to 44KB (was 1 MB before).
Thanks Jorj !!
I'm very suprised to analyze a pdf and try to get all the links and it give me a dict with links but all the same "xref". Is there a way to delete these link although they all have the same xref ? Thanks