grayleonard / booxtream-epub-drm-remover

Removes all "Social DRM" from booXtream ePub files
178 stars 28 forks source link

Script fails to remove covert image watermarking in newer epubs #6

Open lostfictions opened 6 years ago

lostfictions commented 6 years ago

While looking into #5, I tried downloading a free ebook from Verso under two different accounts and running the script on them, and then diffing the results using the diff.sh script included in this repo to verify that the output was identical.

It turns out it isn't:

diff

This is from a run of the most recent version of the script, where #5 is fixed. Ignoring the spurious error from calibre inserting a bookmark file into one of the epubs, the PNGs in the Images/ directory are not the same.

A quick binary diff of the images suggests significant chunks of them are changed, so it appears to be some kind of covert watermarking not handled by the script.

grayleonard commented 6 years ago

Ah, thanks for investigating further. The script already wipes exif/metadata from images, so it must be something new/different.

Any chance you could send me the two non-cleaned epubs so I could test as well? My email is in my profile.

Artoria2e5 commented 6 years ago

The PNGs differ due to timestamps built into PNGs. The ImageMagick thingy saves the ctime and mtime to the PNGs:

$ diff -Naur <(identify -verbose fooub2/OEBPS/images/3e8f510f36f2f8f255dd.png)  <(identify -verbose fooub3/OEBPS/images/3e8f510f36f2f8f255dd.png)
--- /dev/fd/63  2018-03-14 21:30:55.775635900 -0400
+++ /dev/fd/62  2018-03-14 21:30:55.780978800 -0400
@@ -1,4 +1,4 @@
-Image: fooub2/OEBPS/images/3e8f510f36f2f8f255dd.png
+Image: fooub3/OEBPS/images/3e8f510f36f2f8f255dd.png
   Format: PNG (Portable Network Graphics)
   Mime type: image/png
   Class: DirectClass
@@ -456,8 +456,8 @@
   Compression: Zip
   Orientation: Undefined
   Properties:
-    date:create: 2018-03-14T21:07:13-04:00
-    date:modify: 2018-03-14T21:07:04-04:00
+    date:create: 2018-03-14T21:06:19-04:00
+    date:modify: 2018-03-14T21:06:12-04:00
     png:bKGD: chunk was found (see Background color, above)
     png:cHRM: chunk was found (see Chromaticity, above)
     png:IHDR.bit-depth-orig: 8
@@ -467,10 +467,10 @@
     png:IHDR.interlace_method: 0 (Not interlaced)
     png:IHDR.width,height: 82, 52
     png:sRGB: intent=0 (Perceptual Intent)
-    png:tIME: 2018-03-15T01:07:04Z
+    png:tIME: 2018-03-15T01:06:13Z
     signature: 1b5af409341c6453c741c589eb7dc9ef4142db5ca9527db68ebabcb83e661981
   Artifacts:
-    filename: fooub2/OEBPS/images/3e8f510f36f2f8f255dd.png
+    filename: fooub3/OEBPS/images/3e8f510f36f2f8f255dd.png
     verbose: true
   Tainted: False
   Filesize: 3.6KB

Like the zip timestamps after processing, those times correlate to file creation.

ghost commented 5 years ago

@Artoria2e5

This could have changed recently, or it could be an optional feature of BooXtream, but I bought two copies of the same book and there are definitely actual differences in the content of the images, not just metadata:

[~/book]$ diff -Naur <(identify -verbose cleaned/a/OEBPS/images/5316eab139b427a6a5ff.png)  <(identify -verbose cleaned/b/OEBPS/images/5316eab139b427a6a5ff.png) | grep srgb | wc -l
70
[~/book]$ diff a.ppm b.ppm | wc -l                                                                                                                                                 
32
[~/book]$ diff a.ppm b.ppm | head -n 6                                                                                                                                             
xxx0,xxx1cxxx0,xxx1
< 2x
< 2x
---
> 2y
> 2y

"a.ppm" and "b.ppm" are PPM files generated by GIMP from cleaned/a/OEBPS/images/5316eab139b427a6a5ff.png and cleaned/b/OEBPS/images/5316eab139b427a6a5ff.png. Some diff output has been omitted for paranoia reasons.

Locating the affected pixels in the original, unedited PNG files and using the dropper tool in GIMP confirms that the images differ slightly.