rstudio / shinytest2

https://rstudio.github.io/shinytest2/
Other
105 stars 18 forks source link

`shinytest2` approach to saving/comparing images #4

Open schloerke opened 3 years ago

schloerke commented 3 years ago

This issue is to propose a new approach to how images are store/compared in an attempt to dramatically reduce the number of snapshot file, reducing overall confusion.



@wch What are your thoughts?

I believe this is a win/win in that we get less images to maintain and we also get to fuzzy match when testing on differing OS/RVersions. (Exact matching on the master OS/RVersion; Fuzzy matching on all other OS/RVersions.) This would also allow for Linux machines to be bold and not bold and not provide false-positive results in shinycoreci-apps.

cc @MadhulikaTanuboddi

schloerke commented 3 years ago

Talking with @jcheng5 , it is good to be reminded of the questions we are trying to address.

Questions we are trying to address


Notes for future fuzzy matching (if implemented), Fingerprinting is good for comparing image X to many other images. However in our situation, we have two images in our hands. Better heuristic might be to look at the diff matrix of the two images and draw conclusions from there. Ex: > 2% of image is a failure, If a 20px square is found to be different it is a failure, etc...

schloerke commented 3 years ago

Talking through these situations over dinner, we need to consider the full set of steps of resolving an error.

Approach: Store all variants as truth

Approach: Store single variant as truth and fuzzy match others

Using the new approach...


Known testing problems:


Proposal

Hybrid approach: Allow for both a single variant and all variants and each approach should allow for tolerances.

I believe the above approach will address most situations and give freedom to users / developers.

schloerke commented 3 years ago

Because the OS/RVersion is so important, I believe there should be two different methods and not tied to variant.

(Poorly named) Example:

rpodcast commented 2 years ago

This is arguably one of the most complex issues in the general shiny testing workflow, and one unfortunate problem with my day job environment is that our internal RStudio Workbench and RStudio Connect servers are run on RHEL 7, while running CICD is through GH Actions. It's not looking like GH Actions will support RHEL or CentOS anytime soon on the Linux side, so I won't be able to do any robust evaluations of plot or app screenshots if the test is run on GH.

(Poorly named) Example:

* `st2_expect_platform_snapshot(app)`; `st2_expect_snapshot(app, os_platform = TRUE)`

* `st2_expect_original_snapshot(app)`; `st2_expect_snapshot(app, os_platform = FALSE)`

I like this approach, and would be interested in collaborating on a solution to this. I know it is likely too complex for the initial release, but it is an issue I'm motivated to solve.

schloerke commented 2 years ago

I like this approach, and would be interested in collaborating on a solution to this. I know it is likely too complex for the initial release, but it is an issue I'm motivated to solve.

@rpodcast Awesome, I'll take you up on that for another release!


Yes, I'd like for this feature to live in {testthat} as {shinytest2} snapshots are only a vehicle to produce {testthat} snapshots.

Current thought process:

Current questions:

schloerke commented 2 years ago

Update to

Because the OS/RVersion is so important, I believe there should be two different methods and not tied to variant.

(Poorly named) Example:

  • st2_expect_platform_snapshot(app); st2_expect_snapshot(app, os_platform = TRUE)
  • st2_expect_original_snapshot(app); st2_expect_snapshot(app, os_platform = FALSE)

app$expect_values() saves a JSON file of value content and will save a screenshot that will never fail an assertion but should be checked into GitHub. This extra debug screenshot will be visible when viewing diffs, but will never cause a failure. Only the accepted debug screenshot should be kept in Git. New debug screenshots should be .gitignore'ed.

app$expect_screenshot() will cause a failure if the captured screenshot is any different. This method should only be used as a last resort as it is very brittle. variant in AppDriver$new(variant=) is required when calling app$expect_screenshot().

Both methods will listen to the app's variant value provided at initialization of app <- AppDriver$new(variant=).


I am expecting users to not use variant and to not use app$expect_screenshot().

I am expecting the Shiny team or other UI packages to use app$expect_screenshot() with many variants.

rpodcast commented 2 years ago

I am expecting users to not use variant and to not use app$expect_screenshot().

I am expecting the Shiny team or other UI packages to use app$expect_screenshot() with many variants.

I will want to discuss that assumption in more detail, as my previous projects around shinytest involved ensuring the app at least rendered "something" to the user, while not necessarily the contents themselves (which may be handled more directly by comparing values comprising the output such as a table). Hence my ideal will be to have an assertion that something appeared, Once I begin to migrate a very large-scale shinytest effort from a few years ago, I'll be able to add more context.

danielinteractive commented 2 years ago

Very naive question: I guess there is no way to do vector graphics comparisons a la vdiffr in this case, right?

schloerke commented 2 years ago

@danielinteractive Unfortunately no.

vdiffr transforms ggplot2 objects into <svg /> representations as there is a 1:1 conversion between the underlying {grid} layout and what can be produced with an <svg />. In addition all of the CSS information is embedded into the <svg />.

While {shinytest2} does have access to the page <html />, we would have to ask for which CSS information we would like to also retrieve for every node. It would work and asking for most of the CSS names, it would prolly be fairly useful. This would prolly work for anything that isn't binary across platform. We expose $expect_html() to perform the <html /> comparisons. I thought the computed CSS information would be too overwhelming and brittle, so I did not include it as a method.

However, this text representation will fail when R uses a different font for its plots as the text representation is hidden in the image. So we would not be able to compare byte for byte on two plot images across platform. (We currently have this issue, and that is why we suggest using variant = platform_variant() to maintain multiple images.)

schloerke commented 2 years ago

@rpodcast Good news. I've been green-lighted to work on a {testthat} snapshots. This includes snapshot comparisons for GitHub Actions. It'll be a bit before work starts, but it will come eventually! The goal will be to automate the retrieval of any {testthat} snapshots on GHA to be compared locally (given the latest GHA run). 🤞

Until that is implemented, then the approach to comparing images is not complete and will keep the issue open.


shinytest2 approach to saving images has been completed and is explained here.

In addition, fuzzy image comparisons is implemented in this PR.

Alik-V commented 1 year ago

Hi @schloerke, Thank you for this package! I've been playing with it and started running into the problems described in this issue where the tests created on a windows machine do not line up with linux runners via GHA. I've seen that there is a merged PR with fuzzy comparison - is it implemented in latest dev version of the package? Is there documentation I could peek into describing how to use the new functionality?

schloerke commented 1 year ago

@Alik-V Glad you like the package!

Check out the docs for AppDriver$expect_screenshot(). There is a bug in v0.2.0 on CRAN that throws an error if the images are different and threshold is set. The difference value is more of a distance calculation, rather than a T/F calculation. Ex: A transparent white pixel compared to a solid black pixel will have a total difference of 4 as all four of the RGBA channels are 100% different. Where as a slightly off pixel due to rounding errors by Chome will have a small distance, ~= 3 * 1 / 256 ~= 1%.

However, I do not expect image rounding it to help that much that images taken on Windows will match images taken on Linux. If you have custom fonts and no base plots, it might work but I'm not hopeful.

Example usage:

Alik-V commented 1 year ago

@schloerke Thank you for explanations and pointing towards the documentation!

At the moment, I have opted out for using platform = platform_variant() and recording both windows and linux tests results (linux recorded through workbench) separately, but it would certainly be easier I was able to use a single results folder through platform = NULL. I will play with thresholds, I like this approach to snapshot testing.