Open warren-bank opened 6 years ago
PDF-XChange has a free built-in ALL links removal without purchasing plus version. You can do that in tab Home -> Links -> Remove all web-links.
@mathikas
7.0.324
.. simply because I haven't bothered to updatefile1.pdf
with watermarksfile1.pdf
(renamed to in.pdf
) and outputs out.pdf
without watermarksout.pdf
in the (aforementioned) viewer program, and resave it as file2.pdf
file2.pdf
is the final output.. and all other files (file1.pdf
, in.pdf
, out.pdf
) can be deletedYes, I use this script to remove the watermarks added by the free version of PDF-XChange Editor after using its premium features. The script works pretty well, but it leaves behind invisible links to the PDF-XChange website in the top left and right corners.
I saw your above solutions, but unfortunately, none of them worked for me. However, I discovered that after run the script to remove visible watermark, I use the built-in "Remove all web-links" feature (that doesn't require plus/premium, so it won't add another watermark) to remove all the remaining invisible links. This results in a very clean PDF, tested on the newest version.
The only downside is that it also removes any other links that may be present in the document. To address this, I manually remove the PDF-XChange website link using "add/edit links" feature (also free), which can be time-consuming.
ohhh, ok.. now I understand your original comment. I wasn't sure how familiar you were with this repo.. what the script does.. etc. You're right then.. for a pdf that doesn't have any other web links, this feature would be a quick and easy solution. Thanks for sharing.
I'm amazed that this script still works on the watermarks added by the current release of the editor. I haven't updated in years, and would've expected that the watermarks had changed in some significant way after all that time that would prevent the script from being able to detect and remove them.
admittedly, using a 2nd tool to cleanup after the 1st tool is janky..
but I have to admit that PyMuPDF
is powerful!
personally, I'm not a Python guy..
I have an old version handy, but it's too old to use to test PyMuPDF
..
and I don't feel like updating.
that said, PyMuPDF
has an online web console that works great..
and can be used without the need to install anything.
here is a script that I wrote:
"""
https://github.com/pymupdf/PyMuPDF
https://pymupdf.io/
https://pymupdf.readthedocs.io/en/latest/document.html
https://pymupdf.readthedocs.io/en/latest/document.html#Document.scrub
https://pymupdf.readthedocs.io/en/latest/document.html#Document.pages
https://pymupdf.readthedocs.io/en/latest/document.html#Document.save
https://pymupdf.readthedocs.io/en/latest/document.html#Document.tobytes
https://pymupdf.readthedocs.io/en/latest/page.html
https://pymupdf.readthedocs.io/en/latest/page.html#Page.get_links
https://pymupdf.readthedocs.io/en/latest/page.html#Page.delete_link
https://pymupdf.readthedocs.io/en/latest/link.html
https://pymupdf.readthedocs.io/en/latest/link.html#Link.uri
https://pyodide.org/en/stable/usage/quickstart.html#accessing-javascript-scope-from-python
"""
watermark = "https://www.tracker-software.com/product/pdf-xchange-editor"
is_online_web_console = True
debug_log = True
do_scrub = False
if debug_log:
print('All links before removal:')
for page in doc.pages():
print(f'Page: {page.number}')
for link in page.get_links():
if 'uri' in link:
print(f'Link: {link.get("uri")}')
for page in doc.pages():
for link in page.get_links():
if 'uri' in link and link.get('uri') == watermark:
page.delete_link(link)
if debug_log:
print('All links after removal:')
for page in doc.pages():
print(f'Page: {page.number}')
for link in page.get_links():
if 'uri' in link:
print(f'Link: {link.get("uri")}')
if do_scrub:
doc.scrub(attached_files=False, clean_pages=False, embedded_files=False, hidden_text=True, javascript=True, metadata=True, redactions=False, redact_images=0, remove_links=False, reset_fields=False, reset_responses=False, thumbnails=True, xml_metadata=True)
if not is_online_web_console:
doc.save('out.pdf', garbage=3, deflate=True)
else:
import base64
import js
bytes = doc.tobytes(garbage=3, deflate=True)
data_uri = 'data:application/octet-stream;base64,' + base64.b64encode(bytes).decode('ascii')
div = js.document.createElement('div')
div.innerHTML = '<a href="' + data_uri + '" download="out.pdf">Download modified PDF file</a>'
js.document.body.prepend(div)
Open
Download modified PDF file
I went down a bit of a rabbit hole..
I was curious if PyMuPDF
could also be used to remove the watermark images.
The short answer is: no.
The longer answer is...
for page in doc.pages():
print(f'Page: {page.number}')
for drawing in page.get_drawings():
print(f'Drawing: {drawing}')
"""
Observations:
* 8 drawings per page
- 2 watermarks per page
- 4 drawings per watermark
* every page has the same 8 drawings,
and all share the same sets of Rect(x, y, x, y) coordinates
* each of the 8 Rect() coordinates seems to always be paired to the same unique sequence number
"""
watermark_coords = {
"0": [-0.0010000000474974513, -0.0009918212890625, 73.87300109863281, 73.87300872802734],
"1": [6.480000019073486, 6.480010986328125, 67.39199829101562, 67.39200592041016],
"2": [13.607999801635742, 13.608009338378906, 60.263999938964844, 60.264007568359375],
"3": [17.82699966430664, 17.827011108398438, 56.04499816894531, 56.04500961303711],
"6": [521.402587890625, -0.0009918212890625, 595.2765502929688, 73.87300872802734],
"7": [527.883544921875, 6.480010986328125, 588.7955322265625, 67.39200592041016],
"8": [535.0115356445312, 13.608009338378906, 581.6675415039062, 60.264007568359375],
"9": [539.2305908203125, 17.827011108398438, 577.4485473632812, 56.04500961303711]
}
found = 0
for page in doc.pages():
print(f'Page: {page.number}')
for drawing in page.get_drawings():
if 'seqno' in drawing and 'rect' in drawing and str(drawing.get('seqno')) in watermark_coords:
rect = drawing.get('rect')
mark = watermark_coords.get(str(drawing.get('seqno')))
if ([rect.x0, rect.y0, rect.x1, rect.y1] == mark):
print(f'found watermark drawing: {drawing}')
found += 1
print('')
print(f'found {found} watermark drawings')
"""
https://github.com/pymupdf/PyMuPDF/issues/847
issue: there is no way to remove the found watermark drawings
"""
notes:
FlateDecode
objectsworkaround 1:
--unzip
./filter.sh --unzip
call filter.bat --unzip
out.pdf
in a PDF Viewerworkaround 2:
%PDF-1.4
in.pdf
filter.pl
out.pdf
in a PDF Viewer