impresso / federal-gazette

0 stars 0 forks source link

converting tif into jp2 #5

Closed aflueckiger closed 4 years ago

aflueckiger commented 5 years ago

@simon-clematide As suggested here, I attempted to convert the extracted tif files of Feuille Federale into tif files with grey-scale, 8bit by convert data_tif/FedGazDe/1998/02/03/a/FedGazDe-1998-02-03-a-0001.tif -depth 8 -colorspace Gray tiff_test/conv_FedGazDe-1998-02-03-a-0001.tif

Unfortunately, the conversion doesn't produce the expected output. The corresponding meta-information keeps the same concerning the relevant dimensions (bit, scheme).

tiffinfo data_tif/FedGazDe/1998/02/03/a/FedGazDe-1998-02-03-a-0001.tif
TIFF Directory at offset 0xae4 (2788)
  Image Width: 1130 Image Length: 1646
  Resolution: 200, 200 pixels/inch
  Bits/Sample: 1
  Compression Scheme: CCITT Group 4
  Photometric Interpretation: min-is-white
  Samples/Pixel: 1
  Rows/Strip: 1646
  Planar Configuration: single image plane
tiffinfo tiff_test/conv_FedGazDe-1998-02-03-a-0001.tif

TIFF Directory at offset 0xae4 (2788)
  Image Width: 1130 Image Length: 1646
  Resolution: 200, 200 pixels/inch
  Bits/Sample: 1
  Compression Scheme: CCITT Group 4
  Photometric Interpretation: min-is-white
  FillOrder: msb-to-lsb
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 1646
  Planar Configuration: single image plane
  Page Number: 0-1

Using opj_compress directly on the original TIFs is not an option since they are B/W instead of RGB or GREY. As quick sample suggests, the specs of the TIFs are fairly homogeneous prior to the change in 1999 (afterwards non-OCR). Only the dpi changes between 200 and 300.

Does anyone have a suggestion?

simon-clematide commented 5 years ago

@aflueckiger This command worked. Can you check whether opj_compress can deal with it?

$ convert -compress lzw  FedGazDe-1849-02-24-a-0001.tif  -depth 8 -colorspace Gray  /tmp/FedGazDe-1849-02-24-a-0001.tifs
$ tiffinfo /tmp/FedGazDe-1849-02-24-a-0001.tif
TIFF Directory at offset 0x16722 (91938)
  Image Width: 1415 Image Length: 2374
  Resolution: 300, 300 pixels/inch
  Bits/Sample: 8
  Compression Scheme: LZW
  Photometric Interpretation: min-is-black
  FillOrder: msb-to-lsb
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 1
  Rows/Strip: 2374
  Planar Configuration: single image plane
  Page Number: 0-1
  Predictor: horizontal differencing 2 (0x2)
siclemat@asbru:/mnt/storage/harlie/projects/climpresso/federal-gazette/data_tif/FedGazDe/1849/02/24/a
aflueckiger commented 5 years ago

@simon-clematide

Thanks, the command works as expected and the output tif can be converted further into jpeg 2000. We can do the conversion as follows:

convert -compress lzw FedGazDe-1851-01-04-a-0002.tif -depth 8 -colorspace Gray FedGazDe-1851-01-04-a-0002_conv.tif
opj_compress -r 10 -i FedGazDe-1851-01-04-a-0002_conv.tif -o FedGazDe-1851-01-04-a-0002_conv.jp2

The only remaining question is then what compression ratio we apply (using ratio 10, jp2 is around 320kb).