tjko / jpeginfo

jpeginfo - prints information and tests integrity of JPEG/JFIF files
http://www.iki.fi/tjko/projects.html
GNU General Public License v3.0
139 stars 17 forks source link

Add unittests #29

Closed schwehr closed 10 months ago

schwehr commented 10 months ago

It would good to have some unittests that cover the basics of jpeginfo. Doing more cleanup without tests is likely to accidentally introduce bugs.

There are lots of ways to do this. Here is one possibility similar to what I added to MB-System here. It uses python without any additional requirements to run the binary with a variety of inputs. I am sure the test files here don't cover the entire range of JPEGs, but it should hit most of the key points. The files come from GDAL when marked as such. Otherwise, they are just a crop of a picture I took. I then modified the files with Photoshop and ImageMagick. I only covered the cases of no input file and an input file that is just a text file, so there is lots of room for tests covering bad jpegs.

https://gist.github.com/schwehr/163029772e423848fa1c4769b0a8eecf

A couple examples of the tests:

  def test_simple(self):
    command = [self.command, 'testdata/test1.jpg']
    output = subprocess.check_output(command).decode("utf-8").rstrip()
    self.assertRegex(
      output, r'testdata/test1.jpg +100 x +44 24bit P ICC,Adobe +6095')
  # https://github.com/OSGeo/gdal/tree/2cfeb1a9b03716c8348d6f1f4c611f2589d53e76/autotest/gdrivers/data/jpeg/albania.jpg
  def test_24bit_exif(self):
    command = [self.command, '-j', 'testdata/albania.jpg']
    output = subprocess.check_output(command).decode("utf-8")
    result = json.loads(output)
    expect = [{
        'filename': 'testdata/albania.jpg',
        'size': 12574,
        'hash': '',
        'width': 361,
        'height': 260,
        'color_depth': '24bit',
        'type': 'JFIF,Exif',
        'mode': 'Normal',
        'info': 'Huffman',
        'comments': '',
        'status': '',
        'status_detail': ''
    }]
    self.assertEqual(expect, result)

testdata.zip

tjko commented 10 months ago

I think that is a good idea. Python (3.x) unittest package looks nice and lightweight, and should be trivial to get the unit tests run in Github actions.

Image copyrights then again can be a problem, with varying laws/regulations around the globe. If including test images in the source package...

I wonder if best approach would be to generate some test "patterns" as png/tiff and convert to various size test jpeg images...

In that testdata.zip, are those images starting with "test", cropped from an image you've taken yourself? If you agree to release those under some permissive CreativeCommons license, etc. then those should be "safe" to include. Rather not include images from any third-parties (or images with any logos/flags, etc...to keep the source package as neutral as possible...)

tjko commented 10 months ago

I added initial (unit) tests. I used the unittest module in Python as you suggested, but took slightly different approach to make each test slightly shorter (less repeating similar statements on each test case).

    def test_md5(self):
        """test image MD5 checksum"""
        output, _ = self.run_test(['--md5', 'jpeginfo_test1.jpg'])
        self.assertIn('536c217b027d44cc2e4a0ad8e6e531fe', output)

This module seems to run tests really fast (doesn't really add noticeable overhead if running tests as part of the build):

...............
----------------------------------------------------------------------
Ran 15 tests in 0.038s

OK
schwehr commented 10 months ago

Cool. What you did is totally fine. Performance doesn't really matter at this scale. The pictures I took are whatever license the project would like. Any one of CC0, CC-BY-4.0, GPLv3 are fine. I see you picked different pictures... it totally doesn't matter what the content of the pictures are as long as they are small.

schwehr commented 10 months ago

Closing as there is now https://github.com/tjko/jpeginfo/blob/306db650a8a2f185e114a74bcaafcc3dc4ea4348/test/test.py