Approximate image comparison for render tests

Currently, the render tests compute a SHA-1 checksum of the output from the renderer, then compare that to a known value to determine if the test passes. This is fragile, because minor floating point differences (even that are spec compliant) can cause minor variations, which break tests. For example, the emulator and verilator need different checksums.

Instead, make the test harness load the reference image and perform a pixel-wise comparison, computing the mean squared error between them. If this is below a threshold, the test should pass.

Currently, this is handled in tests/test_harness.py, run_render_test. Since the project now uses the Python Imaging Library (PIL), that could be used to load the reference image and possibly to help with the comparison.

jbush001 / NyuziProcessor

Approximate image comparison for render tests #182