mounaiban / captdriver

Driver for Canon CAPT printers
GNU General Public License v3.0
89 stars 16 forks source link

Printer MF3228 encoded data [CCITT G4/CARPS] #23

Closed redjoe closed 2 years ago

redjoe commented 2 years ago

Pls help understand encoded data printer MF3228

White pages encoded: ff ff ff … 00 08 80 A4 300dpi, length 426 bytes ff ff ff … 00 08 80 A4 600dpi, length 846 bytes ff ff ff … 00 08 80 A5 300dpi, length 298 bytes

duplicate \xff bytes omitted

Black fill pages: 64 05 74 a0 f8 ff … ff 01 10 00 01 A4 300dpi, length 854 bytes 64 05 7c 40 01 93 ff … ff 1f 00 01 10 A4 600dpi, length 1701 bytes 64 05 da 60 F5 ff … ff 01 10 00 01 A5 300dpi, length 598 bytes

Left vertical black line. Origin top left corner. All samples with format A4, 300 dpi. Width 2360px, height 3384px Offset of the printable area relative to the left side (the origin) is 5.165mm. Pixel width is approximately 0.085mm. I printed samples with the Inkscape.

example data for width 5.165mm, 1px 64 d5 ff … ff 0f 80 00 08 data length 1274 bytes

hex bytes pixels width, mm end bytes
64 d5 01100100 11010101 1px 5.165 0f 80 00 08
64 fd 01100100 11111101 2px 5.25 07 40 00 04
64 ed 01100100 11101101 3px 5.335 07 40 00 04
64 f5 01100100 11110101 4px 5.42 0f 80 00 08
64 e5 01100100 11100101 5px 5.505 1f 00 01 10
64 a5 01100100 10100101 6px 5.59 1f 00 01 10
64 c5 01100100 11000101 7px 5.675 3f 00 02 20
64 45 01100100 01000101 8px 5.76 7f 00 04 40
64 45 ef 01100100 01000101 11101111 9px 5.843 7f 00 04 40
64 85 fc 01100100 10000101 11111100 10px 5.93 00 08 80
64 85 fe 01100100 10000101 11111110 11px 6.015 00 08 80
64 85 ff 01100100 10000101 11111111 12px 6.1 00 08 80
64 05 66 b0 fe 01100100 00000101 01100110 10110000 11111110 - 35mm 01 10 00 01
64 05 16 20 f9 01100100 00000101 00010110 00100000 11111001 - 42mm 01 10 00 01
64 05 36 60 f2 01100100 00000101 00010110 00100000 11111001 - 52.2mm 02 20 00 02
64 05 6e 30 f3 01100100 00000101 01101110 00110000 11110011 - 105mm 03 20 00 02
64 05 04 fa 01100100 00000101 00000100 11111010 - 157.5mm 03 20 00 02

left_vertical_black_line

mounaiban commented 2 years ago

MF3228 reportedly uses CARPS, not CAPT.

The carps-cups driver would likely be the place to start. The carps.txt file documents the compression format on known devices, hopefully the MF3228 won't be too different.

UPDATE: I just realised there is an existing issue on the carps-cups repo that concerns MF3228 support, I'll just link it so the others can find the really useful info you posted here: https://github.com/ondrej-zary/carps-cups/issues/15

I'm afraid I won't be able to help you much at this point, as I am not yet familiar with some of the techniques used in the compression. This issue has been closed as support for this device is beyond the scope of captdriver.

mounaiban commented 1 year ago

Maybe I understand what's going on with the white pages. I'm just going to assume that the uncompressed image is encoded like Netpbm P4, based on the assumption that MF3228 only does mono printing: one bit per pixel, eight pixels per byte

The white pages appear to be RLE-compressed:

2360 * 3384px / A4 300 dpi (120px W, 123px H crop)

Encoded in 295B 3384 == 998280B Assuming that 00 is some indicator for RLE mode and 08 80 (2176) is a repeat count, the compressed version is encoded in 423B 2176 == 920448B which is somewhat close. There might be something else going on... The 3384 line count might be the result of rounding to the next lower multiple of 8 or 4.

A4 600 dpi (120px crop in both sides)

The data size is twice as large as the compressed A4 300dpi white page :grin:

A5 300 dpi

A5 is 5.8x8.3 in == 1470x2490px raw 1350 2370 with 120px crop 1344 2370 with rounding to byte size == 168B 2370 == 398160B 295B 2370 == 641920B, close to 320960B * 2 (did you mean A5 600 dpi?)

Black pages and Black Stripe

Black pages seem to have very different starting bytes for different image sizes. Both black and white pixels are referenced with \xff. Could there be some kind of dictionary in use? Could it be LZ77 (which can look like RLE in some cases)?

The black stripe pages seem to suggest that some kind of dictionary encoding is in use, which LZ family encoders are based on.

Earlier this year, I wrote a Python script sample_blots.py that generates a bunch of patterns for studying RLE compression. I hope it helps here too...

Just be careful to avoid accidentally overwriting files with the script, the overwrite detection in the script is a little lacking :warning:

redjoe commented 1 year ago

Data compression by CCITT Group 4.

mounaiban commented 1 year ago

I repeated your black page and white page experiments with a hand-coded SVG and an rsvg-convert-GhostScript pipeline, and got similar but different results:

Black Page SVG :black_circle:

<?xml version='1.0' encoding='UTF-8' standalone='no' ?>
<svg width='210mm' height='297mm' xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink'>
<desc>Just a blank, black A4 page</desc>
<rect x='0' y='0' width='210mm' height='297mm' stroke='black' fill='black' />
</svg>

White Page SVG :white_circle:

<?xml version='1.0' encoding='UTF-8' standalone='no' ?>
<svg width='210mm' height='297mm' xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink'>
<desc>Just a blank, white A4 page</desc>
<rect x='0' y='0' width='210mm' height='297mm' stroke='white' fill='white' />
</svg>

My pipeline:

rsvg-convert -f pdf -o $PDF_FILE $SVG_FILE
gs -dSAFER -dNOPAUSE -dNOPROMPT -r 600 -SDEVICE=faxg4 -o $IMAGE_FILE $PDF_FILE

Try rsvg-convert -x 0.801 -y 0.801 if the resulting PDF has a larger page size than expected (version 2.40.2 needs this fix)

Results: A4 600dpi Black Page: 26 a0 3e 03 81 af ff ... ff fe 0a (1762 bytes) A4 600dpi White Page: ff .. ff 80 0a (879 bytes)

redjoe commented 1 year ago

Repeated steps and add option page size -g4720x6768. gs -dSAFER -dNOPAUSE -dNOPROMPT -r600 -SDEVICE=faxg4 -g4720x6768 -o $IMAGE_FILE $PDF_FILE I got a result 26 a0 3e 02 80 c9 ff ... ff f8 black page 600dpi. Compare with implement driver 64 05 7c 40 01 93 ff ... see difference bit numbering.

26 a0 3e 02 80 c9 -> 00100110 10100000 00111110 00000010 10000000 11001001
64 05 7c 40 01 93 -> 01100100 00000101 01111100 01000000 00000001 10010011

Try decompose result from GhostScript 26 a0 3e 02 80 c9 where MSB2LSB bit order:

|Horizontal Mode Coding
|--
|  |a0a1, distance = 0 (White codes)
|  |--------
|  |        |a1a2, coding length 2560 (Black codes)
|  |        |------------
|  |        |            |a1a2, coding length 2112 (Black codes)
|  |        |            |-------------
|  |        |            |             |a1a2, coding length 48 (Black codes)
|  |        |            |             |------------
00100110 10100000 00111110 00000010 10000000 11001001

2560 + 2112 + 48 = 4720px. Get code word length by link https://www.itu.int/rec/T-REC-T.6-198811-I/en or libtiff/t4.h.

For your example A4 600dpi Black Page 26 a0 3e 03 81 af. I got width 2560 + 2368 + 39 = 4967px.

|Horizontal Mode Coding
|--
|  |a0a1, distance = 0 (White codes)
|  |--------
|  |        |a1a2, coding length 2560 (Black codes)
|  |        |------------
|  |        |            |a1a2, coding length 2368 (Black codes)
|  |        |            |-------------
|  |        |            |             |a1a2, coding length 39 (Black codes)
|  |        |            |             |------------
00100110 10100000 00111110 00000011 10000001 10101111

I got difference ending file from GhostScript. I didn't see end-of-facsimile block (EOF). The format if EOF 0000 0000 0001 0000 0000 0001. Maybe faxg4:

Group 4 fax, with EOLs but no header or EOD.

A4 600dpi Black Page: 26 a0 3e 03 81 af ff ... ff f0 where width 4967px A4 600dpi Black Page: 26 a0 3e 03 81 af ff ... ff f8 where width 4720px A4 600dpi Black Page: 64 05 7c 40 01 93 ff … ff 1f 00 01 10 canon driver with LSB2MSB bit order

mounaiban commented 1 year ago

Just in case it matters, I was using GhostScript 9.26 from late 2018, but I doubt that makes much of a difference, unlike JPEG or other lossy compression codecs.

Looks like using the GS encoder as-is won't work, but it looks to me that the changes required won't be too difficult to implement. Or maybe I might have missed some option that enables LSB-first mode? (it would make things so easy if there was such a thing!)

redjoe commented 1 year ago

I found parameter dFillOrder=2 stored in lower-order bits of the byte.