Proposed RFC Feature - AtomSampleViewer Pre-Checkin Wizard

santorac commented 2 years ago

Summary:

AtomSampleViewer's automated screenshot tests are Atom's most exhaustive safety net against bugs and regressions. We want to formalize a process for developers to run these tests before every render-related merge and provide tools that help streamline this process.

What is the relevance of this feature?

There are a number of problems around testing the renderer that we would like to address with improved process and tools. Ultimately these problems stem from the fact that testing render results is hard to automate because hardware and driver differences (and gremlins). We have to keep loose thresholds to avoid false positives, but this increases the risk of false negatives. And so we need humans involved in the testing process on a regular basis, using loose thresholds as a first line of defense, and then use human inspection to identify false negatives.

We need a report that can be pasted into a PR description to provide assurance the developer thoroughly exercised the required ASV test cases.
We need regular human verification of screenshot results, to avoid false negatives.
We need the process to be simple enough that developers don't hate the checkin process. Stretch goal: can we make developers love the checkin process?
We should not rely on BeyondCompare, because not everyone has it.

Feature design description:

Let's create a "wizard" that walks the developer through a series of checks, helps them verify the most important screenshots, and ends with a brief report of the testing that was completed for inclusion in the PR.

Instead of running the automation normally, they will go to the Automation menu and pick "Run pre-commit wizard...". This will run the _fulltestsuite_ script first, then run through a series of screens for the user to interact with...

On the first screen we'll see a summary of the steps that will take place.

(in case you are wondering, the wizard image is Public Domain https://www.wpclipart.com/cartoon/mythology/wizard/wizard_angry.png.html )

After running the full test suite, the wizard will show a summary of the auto-test results. You can click a "See Details" button to open the normal Script Results dialog that we are all used to seeing.

Next the user will be required to inspect and respond to a series of manual screenshot verifications. I'm not sure the exact heuristic we'll use for determining which screenshots to present and how many to present; it's something we can iterate on. Here are some options I have in mind...

Sort the results from largest to smallest diff score.
Continue presenting screenshots until the user reports "no visual difference".
Continue presenting screenshots until some minimum diff score is reached, regardless of user response.
Any other ideas?

The screenshot evaluation screen will have two image swatches. The first will highlight significantly different pixels in red. The second will automatically flip back and forth between the expected baseline screenshot and the captured screenshot. The user can turn off the automatic flipping as needed, but auto-flip is the default. The user can drag and mousewheel to pan and zoom both swatches in sync.

The user must select one of the available options: "I don't see any difference", "I see a benign difference", "I see a difference that's probably benign)", "This looks like a problem".

For each screenshot, we should provide a description of what's important in determining whether certain differences are benign. This will require us to update all our scripts, something we can do gradually. For example, for MaterialHotReloadTest I would say "It's important that the pattern and color match, as well as which words are shown at the bottom." So if there are differences is aliasing or sampling, the user should be able to read this description and pick "I see a benign difference".

We'll also have a button to quickly export artifacts that shows the diff mask and the expected/actual images, so the results can be easily shared. Note this could take several forms: export an animated gif where the expected/actual images flip back and forth, export an animated "apng" (since gif is limited to 256 colors), or just a still image that lines up the expected/actual/diff next to each other. The exact set of artifacts is something we can iterate on.

ScreenshotInspection

On the final screen, we'll show a summary of the auto test, a list of any auto failures, a summary of the interactive screenshot inspection, and a list of any issues that were reported. The "Copy to Clipboard" function is what we'll normally use, and paste the output into the PR description. We should also have buttons for re-opening any of the prior screenshot inspection screens, so the user can re-inspect and export anything they missed.

What are the advantages of the feature?

By requiring ASV results in every PR, we will catch issues sooner.
We can feel comfortable loosening the tolerances to ensure we don't have false positives, without worrying about introducing false negatives because of the human element. This will allow us to use more of these screenshots as gating test cases in the AR system.
Visually inspecting screenshots will be faster since diff's can be viewed in-tool.
We can keep better maintenance of our screenshot golden images because developers will notice drift in the results, like when small benign changes occur.
It will be easier to communicate about screenshot result issues, using the export artifacts feature.
By interactively walking the developer through the test procedure, it reduces the burden of onboarding new developers.

What are the disadvantages of the feature?

We have to trust developers to inspect diffs who might have different opinion or experience about what is a benign difference.

How will this be implemented or integrated into the O3DE environment?

These changes will be made in AtomSampleViewer which lives it its own GitHub repo (https://github.com/o3de/o3de-atom-sampleviewer). Developers must have both O3DE and AtomSampleViewer repo's cloned locally. This is no different from the current operating environment for Atom developers.

Are there any alternatives to this feature?

We could implement something similar in a python tool that runs external to AtomSampleViewer, doing analysis on artifacts captured by AtomSampleViewer after-the-fact. This has the advantage of being able to run ASV on one platform (like mobile) and analyze the results on another platform (developer computer). It might also be easier to develop the UI in PySide Qt rather than ImGui. Disadvantage is that the user might have to jump around more between tools, or it could take longer to develop a streamlined integrated experience. Since we already have built-in systems in place for screenshot analysis, it would probably be easier to just extend those systems with these new features.

How will users learn this feature?

We will need to communicate the general process to the community through a wiki page or official documentation to point them in the right direction. But once they have figured out how to run ASV and open the wizard, then the wizard should guide them the rest of the way.

Are there any open questions?

What are the best ways to export screenshot artifacts? Ideally we want something that the user can drop into the description of a PR on GitHub, or into Jira or on a wiki, and see an animation that flips back and forth between the expected and actual screenshots.
1. Gif is the most portable but only supports 256 colors. This may be enough in some cases, but in others you might not be able to clearly see what the difference is.
2. APNG is an unofficial extension to PNG. It supports 24bit color, and it is supported on the major web browsers and in GitHub. It is not supported in Jira or Confluence. It is not widely supported in image editing programs.
3. WebP has wider support than APNG, but it is not supported on GitHub PRs.
4. We could look into exporting some video format, but most of those are patent protected and not compatible with open source projects.
5. We could output a single image that shows the screenshots side-by-side, also side-by-side with a heatmap that shows where the differences are. It just isn't ideal, can be hard to really see the difference, even with a heatmap.
6. We could output the two screenshots separately and expect the audience to load them in some tool where they can flip back and forth themselves. This is simple to implement but clunky for the audience.
7. It would be nice if GitHub had some some built-in still image comparison feature on PR descriptions. As an alternative, maybe image comparison could be done with some client-side JS plugin.
What (if any) platform requirements should we impose? Historically at Amazon we expected tests to be run on dx12 and vulkan. Some changes could impact other platforms but we haven't required them before. Some contributors might be working on Linux and not have access to Windows. My recommendation is that we do not impose a blanket platform requirement and reviewers should address it on a case by case basis, and use their own judgement.

Adi-Amazon commented 2 years ago

Great initiative. Suggest verifying ability to use this image as I found it online with a quick search and it might carry legal rights.

santorac commented 2 years ago

Great initiative. Suggest verifying ability to use this image as I found it online with a quick search and it might carry legal rights.

I included the link above, it's marked as public domain at the original site I found it. Here's another link that also indicates it is free to use for any purpose. https://www.iconspng.com/image/53292/wizard-silhouette https://www.wpclipart.com/cartoon/mythology/wizard/wizard_angry.png.html

jeremyong-az commented 2 years ago

Looks great! I recommend that V1 just have static images side-by-side, and we can punt the specific details around animated images.

I did have a question about how the clipboard interaction would work for pasting rich data into PRs.

galibzon commented 2 years ago

Looking forward to see this feature in action. I'd only add that the whole feature should be developed in Python, with a tiny layer of C++ so ASV can invoke the wizard.

santorac commented 2 years ago

Hm, TrackView is being updated to use VP9 WebM, see https://github.com/o3de/o3de/pull/9140. Maybe we can use this to export screenshot comparisons that flip back and forth. Here is an example showing that GitHub supports it (from https://test-videos.co.uk/bigbuckbunny/webm-vp9)

https://user-images.githubusercontent.com/55155825/165194386-561295b8-7d8d-4625-9d38-6e6c75f3ff02.mp4

o3de / sig-graphics-audio