oliyh / kamera

UI testing via image comparison and devcards
89 stars 7 forks source link

tests incorrectly pass when 'actual' is bigger than 'expected' but the bits inside match. #32

Closed jleonard-r7 closed 3 years ago

jleonard-r7 commented 3 years ago

After resolution of #20 , this was exposed as an issue. The "actual" images are much taller/longer than the "expected" but somehow the tests still pass (presumably because it's only looking at "overlapping" bits (which do indeed match). I surmise that the fix is as simple as checking for size match before looking at actual bits.

jleonard-r7 commented 3 years ago

Turns out that this issue is a bit more complicated than at first glance. Firstly, the :crop normalisation is making the input images the same size. But, even once that's removed, imagemagick only returns a metric of 0.00671002 for vastly differently sized images (and thus contents). Given that this is under the default threshold of 0.01, the test still passes. (see attached "diff" image for what looks to the naked eye like quite a lot of "difference" for such a small "metric" value).

timerange_picker

oliyh commented 3 years ago

The trim normalisation might also relevant here with the aim of removing surrounding whitespace (which is normally not interesting).

Imagemagick will not compare images of different sizes, but returns an error (at least this was my understanding) so the point of the normalisations is to at least get to a point where they can be compared even if they are different sizes, but information can be lost (the larger one is cropped down to the size of the smaller one) and kamera doesn't assert on this. It would be quite easy to add an optional assertion that the images are the same size.

What I have found through development and using it is that even in CI with identical browsers on identical pages you can still get small differences, so I think a tolerance is appropriate and pragmatic but you are correct that 0.01 mae seems to allow a large difference which could allow 'broken' tests to pass. In some of my tests I have reduced this to 0.001. Perhaps a section in the readme could discuss choosing a value for this option (ideally with diffs to show what 0.1, 0.01 and 0.001 mean in practise).

oliyh commented 3 years ago

In summary my proposal is:

What do you think?

jleonard-r7 commented 3 years ago

Actually I think I have found a better solution. Using the "dssim" algo for comparison gives values that make sense (at least for my current batch of tests). The stderr parsing of the output from imagemagick will need to be tweaked though as its output looks like the following (e.g.):

Offset: 0,0 Channel distortion: DSSIM red: 4.14472e-05 green: 4.14472e-05 blue: 4.14472e-05 alpha: 0.5 all: 3.10854e-05

(The all line is operative here and most of the passing ones are very small numbers specified in scientific notation whereas the failing ones are all just over 0.01, like say 0.015... in regular notation). So, I'd propose allowing the user to also specify the parsing algorithm corresponding to their choice of metric algorithm...

jleonard-r7 commented 3 years ago

Probably better than that would be for the library itself to just have baked in parsers for each of the allowed metric algorithms.

jleonard-r7 commented 3 years ago

i.e., switch on metric here & parse the different outputs appropriately: https://github.com/oliyh/kamera/blob/master/src/clj/kamera/core.clj#L150-L151

oliyh commented 3 years ago

Hi,

Happy to add support for this algorithm, and glad that you found this one more useful. It doesn't solve your issue of the images being very different in size, though, or does it?

jleonard-r7 commented 3 years ago

Yes, it does. The differently sized images are considered different per this algorithm.

jleonard-r7 commented 3 years ago

I can create a PR with the changes I need to support this algorithm (since I imagine it's quite late for you there now). :)

oliyh commented 3 years ago

Hi,

Interesting - I should probably revisit because when I first wrote this Imagemagick just refused to compare images of different sizes, hence the whole normalisation chain. If it now does (or at least this algo does) then that's a bit of a game changer.

A PR would be most welcome - I'm into my weekend now and also in the middle of a bit of a rewrite of superlifter so it's unlikely I'd get to this soon!

Thank you again for your detailed feedback.

jleonard-r7 commented 3 years ago

Yes, a lot of the (now outdated) docs mentioned that issue with imagemagick. But also yes the situation is quite different now: it happily compares differently sized images (and will search within one image for the presence of another, i.e., the -subimage-search option (which can be quite slow).

oliyh commented 3 years ago

I do remember trying the subsearch option and it just locked up my computer for minutes even with a reasonable screenshot size so didn't seem viable!

jleonard-r7 commented 3 years ago

dssim takes a bit longer for the differently sized images but that seems reasonable since those should be the exception. for the same size images, it seems on par with mae.