[Discussion] Support storing reference images as hashes

JoseAlcerreca commented 1 year ago

Not really a feature request yet, but I think this would be a good place to discuss this idea by Alex Vanyo:

If you use a diff threshold of 0, instead of storing the reference images as PNG, store a hash of the file.

The way it works is:

The record task takes new screenshots and stores their hashes in their corresponding files. For example in the same folder with an md5 extension: screenshots/ForYouScreenPopulatedAndLoading_foldable.png.md5
The verification task takes new screenshots, hashes them and compares with the existing files

:+1: It eliminates the problem with large files and how to store them (git LFS, cloud buckets...) :-1: It makes development less intuitive because screenshots no longer live alongside the code (but this is true with cloud buckets, different branches and arguably git LFS) :-1: Reports are tricky because Roborazzi can't know what the base branch is (the commit that was used to generate the existing reference images).

This is doable on CI (see my prototype and example PR), but:

:-1: there's no easy way to run screenshot tests locally :-1: :-1: the PR doesn't show any screenshots so you would have to do something like https://github.com/takahirom/roborazzi-compare-on-github-comment-sample/pull/1 which complicates the workflow even more.

Some very crude ideas:

Store the commit ID that was used to generate each screenshot. Roborazzi could run a command to check out the commit to generate the reports. This is not great.
Create a Github action (and Bitrise, Bitbucket...?) that takes care of everything so at least the CI the dev experience is decent.

takahirom commented 1 year ago

As you mentioned in this issue, in general app development, it's challenging to manage changes if you can't see what's different between images. Therefore, storing images as hashes might not be practical. However, for UI library development where not a single pixel changes, hashing could be useful. Even so, having some way to view the diff when changes do occur would be beneficial. 👀 Hashing could be offered as an option through gradle.properties and record options. I'm open to adding this feature, although I can't quite envision how it would work in practice.

alexvanyo commented 1 year ago

It might be interesting if the hash output can be done by Roborazzi in addition to storing the image itself, as opposed to it being an exclusive choice.

I could see an approach where Roborazzi records both the image and the hash (or just one or the other, depending on configuration). Then during verification for a test with a diff threshold of 0, it can use all available golden information:

if a golden image is present, then it is used directly to verify visual report if the test fails. If a golden hash is also present, maybe sanity check that the golden hash is the hash for the golden image?
if the golden image isn't present, but a golden hash is, then run the test based on the resulting hash value. If the check fails, the test will fail, but there's no "before" image to nicely use in a report
if neither the golden hash nor the golden image is present, then the test fails due to missing goldens

What happens to the generated golden images in version control system could be left up to the project: maybe they choose to check in the raw images anyway directly, maybe they use Git LFS, maybe they prevent the images being committed using .gitconfig or a similar mechanism (and then undertake a more complicated CI setup to regenerate the old images using a base branch, and additional work to make it visible in the pull request).

takahirom commented 1 year ago

Thank you. I'll think about it while making a prototype. I'm wondering whether to base the MD5 calculation on the image's pixels or the file's binary. I'm unsure about what to use as the seed. Do you have any recommendations?

takahirom commented 1 year ago

I've noticed that the environment can greatly affect image pixels. Thus, it would be ideal if we could customize it in a manner similar to DropBox's ImageComparator, especially since we use the maxDistance from DropBox's SimpleImageComparator.

data class Color(val r: Float, val g: Float, val b: Float, val a: Float = 1.0f)

interface Image {
  val width: Int
  val height: Int
  fun getPixel(x: Int, y: Int): Color
}

interface ImageHashComperator {

  data class HashResult(
    val hashString: String
  )

  fun hash(image: Image, mask: Mask? = null): HashResult
  fun areSimilar(hashResultA:HashResult, hashResultB:HashResult): Boolean = hashResultA == hashResultB

takahirom commented 1 year ago

I'm not sure if it's feasible, but I've heard that by using pHash (perceptual hash), you can determine if images are similar based on how close their hash values are. Additionally, with the advancements in generative AI and compression algorithms, there might be other possibilities.

takahirom commented 1 year ago

I've created a prototype of this feature. Some tests still need to be added, but I believe the implementation will resemble the current one. The logic is somewhat complex; however, I'm open to introducing it if there's a team interested. Without users for this feature, I'm hesitant to integrate it. If you're considering it, please leave a reaction. https://github.com/takahirom/roborazzi/pull/204/files#diff-7ba98c05ac15006c23589beeee900e7836c1b28e167b6cd85e55effe788c2736R6

takahirom / roborazzi

[Discussion] Support storing reference images as hashes #177