Open schloerke opened 3 years ago
Talking with @jcheng5 , it is good to be reminded of the questions we are trying to address.
radiant
?); rstudio/shinycoreci-apps
is trying to implement thisNotes for future fuzzy matching (if implemented), Fingerprinting is good for comparing image X to many other images. However in our situation, we have two images in our hands. Better heuristic might be to look at the diff matrix of the two images and draw conclusions from there. Ex: > 2% of image is a failure, If a 20px square is found to be different it is a failure, etc...
Talking through these situations over dinner, we need to consider the full set of steps of resolving an error.
Using the new approach...
Known testing problems:
Hybrid approach: Allow for both a single variant and all variants and each approach should allow for tolerances.
variant = NULL
_snaps/st2_mac_4-0_NAME.png
local_variant_rules
; Ex: Tolerance = 0other_variant_rules
; Ex: Tolerance <= 2% differencevariant = SOMETHING
_snaps/mac_4-0/NAME.png
local_variant_rules
; Ex: Tolerance = 0I believe the above approach will address most situations and give freedom to users / developers.
Because the OS/RVersion is so important, I believe there should be two different methods and not tied to variant
.
(Poorly named) Example:
st2_expect_platform_snapshot(app)
; st2_expect_snapshot(app, os_platform = TRUE)
st2_expect_original_snapshot(app)
; st2_expect_snapshot(app, os_platform = FALSE)
This is arguably one of the most complex issues in the general shiny testing workflow, and one unfortunate problem with my day job environment is that our internal RStudio Workbench and RStudio Connect servers are run on RHEL 7, while running CICD is through GH Actions. It's not looking like GH Actions will support RHEL or CentOS anytime soon on the Linux side, so I won't be able to do any robust evaluations of plot or app screenshots if the test is run on GH.
(Poorly named) Example:
* `st2_expect_platform_snapshot(app)`; `st2_expect_snapshot(app, os_platform = TRUE)` * `st2_expect_original_snapshot(app)`; `st2_expect_snapshot(app, os_platform = FALSE)`
I like this approach, and would be interested in collaborating on a solution to this. I know it is likely too complex for the initial release, but it is an issue I'm motivated to solve.
I like this approach, and would be interested in collaborating on a solution to this. I know it is likely too complex for the initial release, but it is an issue I'm motivated to solve.
@rpodcast Awesome, I'll take you up on that for another release!
Yes, I'd like for this feature to live in {testthat}
as {shinytest2}
snapshots are only a vehicle to produce {testthat}
snapshots.
Current thought process:
foo.json
, then I want to see text diffs for all files at the same time.foo.png
, then I want to see PNG diffs overlayed on all of each other or like a gif switching between all photosCurrent questions:
artifact-A
has a change, artifact-B
has a different change, artifact-C
removes the file, and artifact-D
had no change. Since all artifacts are pointing to the same origin file, this will cause merge conflicts.Update to
Because the OS/RVersion is so important, I believe there should be two different methods and not tied to
variant
.(Poorly named) Example:
st2_expect_platform_snapshot(app)
;st2_expect_snapshot(app, os_platform = TRUE)
st2_expect_original_snapshot(app)
;st2_expect_snapshot(app, os_platform = FALSE)
app$expect_values()
saves a JSON file of value content and will save a screenshot that will never fail an assertion but should be checked into GitHub. This extra debug screenshot will be visible when viewing diffs, but will never cause a failure. Only the accepted debug screenshot should be kept in Git. New debug screenshots should be .gitignore
'ed.
app$expect_screenshot()
will cause a failure if the captured screenshot is any different. This method should only be used as a last resort as it is very brittle. variant
in AppDriver$new(variant=)
is required when calling app$expect_screenshot()
.
Both methods will listen to the app
's variant
value provided at initialization of app <- AppDriver$new(variant=)
.
I am expecting users to not use variant
and to not use app$expect_screenshot()
.
I am expecting the Shiny team or other UI packages to use app$expect_screenshot()
with many variants.
I am expecting users to not use
variant
and to not useapp$expect_screenshot()
.I am expecting the Shiny team or other UI packages to use
app$expect_screenshot()
with many variants.
I will want to discuss that assumption in more detail, as my previous projects around shinytest
involved ensuring the app at least rendered "something" to the user, while not necessarily the contents themselves (which may be handled more directly by comparing values comprising the output such as a table). Hence my ideal will be to have an assertion that something appeared, Once I begin to migrate a very large-scale shinytest
effort from a few years ago, I'll be able to add more context.
Very naive question: I guess there is no way to do vector graphics comparisons a la vdiffr
in this case, right?
@danielinteractive Unfortunately no.
vdiffr
transforms ggplot2
objects into <svg />
representations as there is a 1:1
conversion between the underlying {grid}
layout and what can be produced with an <svg />
. In addition all of the CSS information is embedded into the <svg />
.
While {shinytest2}
does have access to the page <html />
, we would have to ask for which CSS information we would like to also retrieve for every node. It would work and asking for most of the CSS names, it would prolly be fairly useful. This would prolly work for anything that isn't binary across platform. We expose $expect_html()
to perform the <html />
comparisons. I thought the computed CSS information would be too overwhelming and brittle, so I did not include it as a method.
However, this text representation will fail when R uses a different font for its plots as the text representation is hidden in the image. So we would not be able to compare byte for byte on two plot images across platform. (We currently have this issue, and that is why we suggest using variant = platform_variant()
to maintain multiple images.)
@rpodcast Good news. I've been green-lighted to work on a {testthat}
snapshots. This includes snapshot comparisons for GitHub Actions. It'll be a bit before work starts, but it will come eventually! The goal will be to automate the retrieval of any {testthat}
snapshots on GHA to be compared locally (given the latest GHA run). 🤞
Until that is implemented, then the approach to comparing images is not complete and will keep the issue open.
shinytest2
approach to saving images has been completed and is explained here.
In addition, fuzzy image comparisons is implemented in this PR.
Hi @schloerke, Thank you for this package! I've been playing with it and started running into the problems described in this issue where the tests created on a windows machine do not line up with linux runners via GHA. I've seen that there is a merged PR with fuzzy comparison - is it implemented in latest dev version of the package? Is there documentation I could peek into describing how to use the new functionality?
@Alik-V Glad you like the package!
Check out the docs for AppDriver$expect_screenshot()
. There is a bug in v0.2.0 on CRAN that throws an error if the images are different and threshold
is set. The difference value is more of a distance calculation, rather than a T/F calculation. Ex: A transparent white pixel compared to a solid black pixel will have a total difference of 4
as all four of the RGBA channels are 100% different. Where as a slightly off pixel due to rounding errors by Chome will have a small distance, ~= 3 * 1 / 256 ~= 1%.
However, I do not expect image rounding it to help that much that images taken on Windows will match images taken on Linux. If you have custom fonts and no base plots, it might work but I'm not hopeful.
Example usage:
2 / (5 * 5 * 4) = 2%
difference any 5x5 kernel.3000 / (3x RGB channels * 100 * 100) = 3000 / 30000 = 10%
. I would not trust for most applications as the percentage/area is too large, which could allow for true-positive changes to be missed.@schloerke Thank you for explanations and pointing towards the documentation!
At the moment, I have opted out for using platform = platform_variant()
and recording both windows and linux tests results (linux recorded through workbench) separately, but it would certainly be easier I was able to use a single results folder through platform = NULL
. I will play with thresholds, I like this approach to snapshot testing.
This issue is to propose a new approach to how images are store/compared in an attempt to dramatically reduce the number of snapshot file, reducing overall confusion.
Current approach:
_snaps/OS-RVersion/TEST/XYZ.{json,png}
New approach:
_snaps/TEST/XYZ.{json,png}
(No variant)_shinytest2.json
. Info such asOS
andRVersion
.0
, but would have be manually done.st2_expect_html()
:st2_expect_html()
would also be bundled into this situation.st2_expect_html()
should not have fuzzy matching by default. Maybe it should be run throughtagQuery()
(or something similar) to produce consistent output. Ex:attribs$class
names andattribs
keys are in a consistent order.@wch What are your thoughts?
I believe this is a win/win in that we get less images to maintain and we also get to fuzzy match when testing on differing OS/RVersions. (Exact matching on the master OS/RVersion; Fuzzy matching on all other OS/RVersions.) This would also allow for Linux machines to be bold and not bold and not provide false-positive results in shinycoreci-apps.
cc @MadhulikaTanuboddi