takahirom / roborazzi

Make JVM Android integration test visible 🤖📸
https://takahirom.github.io/roborazzi/
Apache License 2.0
742 stars 35 forks source link

[Discussion] Support storing reference images as hashes #177

Open JoseAlcerreca opened 1 year ago

JoseAlcerreca commented 1 year ago

Not really a feature request yet, but I think this would be a good place to discuss this idea by Alex Vanyo:

If you use a diff threshold of 0, instead of storing the reference images as PNG, store a hash of the file.

The way it works is:

:+1: It eliminates the problem with large files and how to store them (git LFS, cloud buckets...) :-1: It makes development less intuitive because screenshots no longer live alongside the code (but this is true with cloud buckets, different branches and arguably git LFS) :-1: Reports are tricky because Roborazzi can't know what the base branch is (the commit that was used to generate the existing reference images).

This is doable on CI (see my prototype and example PR), but:

:-1: there's no easy way to run screenshot tests locally :-1: :-1: the PR doesn't show any screenshots so you would have to do something like https://github.com/takahirom/roborazzi-compare-on-github-comment-sample/pull/1 which complicates the workflow even more.

Some very crude ideas:

takahirom commented 1 year ago

As you mentioned in this issue, in general app development, it's challenging to manage changes if you can't see what's different between images. Therefore, storing images as hashes might not be practical. However, for UI library development where not a single pixel changes, hashing could be useful. Even so, having some way to view the diff when changes do occur would be beneficial. 👀 Hashing could be offered as an option through gradle.properties and record options. I'm open to adding this feature, although I can't quite envision how it would work in practice.

alexvanyo commented 1 year ago

It might be interesting if the hash output can be done by Roborazzi in addition to storing the image itself, as opposed to it being an exclusive choice.

I could see an approach where Roborazzi records both the image and the hash (or just one or the other, depending on configuration). Then during verification for a test with a diff threshold of 0, it can use all available golden information:

What happens to the generated golden images in version control system could be left up to the project: maybe they choose to check in the raw images anyway directly, maybe they use Git LFS, maybe they prevent the images being committed using .gitconfig or a similar mechanism (and then undertake a more complicated CI setup to regenerate the old images using a base branch, and additional work to make it visible in the pull request).

takahirom commented 1 year ago

Thank you. I'll think about it while making a prototype. I'm wondering whether to base the MD5 calculation on the image's pixels or the file's binary. I'm unsure about what to use as the seed. Do you have any recommendations?

takahirom commented 1 year ago

I've noticed that the environment can greatly affect image pixels. Thus, it would be ideal if we could customize it in a manner similar to DropBox's ImageComparator, especially since we use the maxDistance from DropBox's SimpleImageComparator.

data class Color(val r: Float, val g: Float, val b: Float, val a: Float = 1.0f)

interface Image {
  val width: Int
  val height: Int
  fun getPixel(x: Int, y: Int): Color
}

interface ImageHashComperator {

  data class HashResult(
    val hashString: String
  )

  fun hash(image: Image, mask: Mask? = null): HashResult
  fun areSimilar(hashResultA:HashResult, hashResultB:HashResult): Boolean = hashResultA == hashResultB
takahirom commented 1 year ago

I'm not sure if it's feasible, but I've heard that by using pHash (perceptual hash), you can determine if images are similar based on how close their hash values are. Additionally, with the advancements in generative AI and compression algorithms, there might be other possibilities.

takahirom commented 1 year ago

I've created a prototype of this feature. Some tests still need to be added, but I believe the implementation will resemble the current one. The logic is somewhat complex; however, I'm open to introducing it if there's a team interested. Without users for this feature, I'm hesitant to integrate it. If you're considering it, please leave a reaction. https://github.com/takahirom/roborazzi/pull/204/files#diff-7ba98c05ac15006c23589beeee900e7836c1b28e167b6cd85e55effe788c2736R6