tacoss / testcafe-blink-diff

Visual regression for Testcafé
65 stars 15 forks source link

Threshold is ignored since 0.4.12 #43

Closed Morl99 closed 2 years ago

Morl99 commented 2 years ago

Hello,

thank you for maintaining this handy little tool. We have been using it for quite some time now without any problems.

Trying to update to 0.4.12 though leads to the following error in our pipeline:

++ npx testcafe-blink-diff screenshots-chromium --compare base:actual --threshold 0.03
Collecting screenshots...
Processing 5 files...
  folder / account-form (NOT PASS)
    Failed with 0.25% of differences
    Failed with 0.15% of differences
  folder / password-form (NOT PASS)
    Failed with 0.18% of differences
  folder / forgotPassword-form (NOT PASS)
  folder / login-form (PASS)
  folder / register-form (PASS)

As you can see, even though the differences are minimal, somehow they lead to a failure, since 0.03 should translate to 3% (which worked until 0.4.11.

Or did the semantic meaning of the threshold parameter somehow change? Then this should be reflected in the docs.

pateketrueke commented 2 years ago

Well, afaik --threshold 0.03 has been always treated as 0.03% so is not a mistake.

I mean, even before the new release I never converted 0.03 to 3% — I'll add such detail on the README, thank you!

Update: anyway, I'll investigate if this comparison it was computed in a different way on past releases, I'll keep you posted.

matzetronic commented 2 years ago

Hi, I'm also just starting to use this library (thanks btw 👍) and also stumbled over this. While you do not internally need to convert 0.03 to 3% perhaps it would make sense to at least convert it in the output and in the report?!

pateketrueke commented 2 years ago

Hi @matzetronic - yeah, it can be, but the results can include decimals too, so conversion is not really needed imo.

Morl99 commented 2 years ago

I have to disagree, the readme in the main project says the following:

Screenshot 2022-08-31 at 18 16 43

And in any way, 0.03 just means 3% , as you could write 0.03 as 3/100 which just means 3 from hundred... And it just worked that way until the 0.4.11 release (which is just from my observation).

pateketrueke commented 2 years ago

Yeah, probably I commited a mistake on the docs/code earlier, but now the code is based on pixelmatch and I am using the exact code they're using for getting such percentage:

https://github.com/mapbox/pixelmatch/blob/master/bin/pixelmatch#L37

console.log(`error: ${Math.round(100 * 100 * diffs / (width * height)) / 100}%`);

Probably, and almost surely, I did it wrong first time... and the docs did not helped, or, due the change we should't longer assume that 0.03 is 3% and we should update the docs accordingly? Please let me figure out this by doing a regression check, I don't want assume new release is wrong without confirmation.

Morl99 commented 2 years ago

So the threshold parameter now has the unit "percent", that seems fine, but this feels like a breaking change that needs to be communicated, so that people don't get surprised by this.

pateketrueke commented 2 years ago

After a conscious investigation I concluded that the previous version was flaky and yielded bad results (blink-diff was treating 0.03 to 3% ?), all my apologies if you're using this module in your daily job. 🙏🏽

To confirm, I generated base images, both for v0.4.11 and v0.4.13 and while the images are the same, the resullts were very different:

0.4.11 0.4.13
SCR-20220912-lr9 SCR-20220912-lrb

The command used was the same in both runs:

./bin/cli.js e2e/screens --compare base:actual --open --threshold 0.03

In exchange, if we use --threshold 0.0045 on v0.4.11 it would fail similarly as v0.4.13, but definitely it's wrong:

SCR-20220912-lw9

The latest version is more accurate and does not convert the threshold in any manner, it also makes blockOut more reliable than before.

I'll deprecate the v0.4.12-14 and then I'll publish the package as v0.5.0 since this is a major change.

If anyone wants to continue using the old behavior and don't expect blockOut to workout on your tests then keep using the v0.4.11 version, otherwise please consider upgrading to the next version. Thank you all!

Morl99 commented 2 years ago

Awesome, that sounds good.