Open santorac opened 2 years ago
Great initiative. Suggest verifying ability to use this image as I found it online with a quick search and it might carry legal rights.
Great initiative. Suggest verifying ability to use this image as I found it online with a quick search and it might carry legal rights.
I included the link above, it's marked as public domain at the original site I found it. Here's another link that also indicates it is free to use for any purpose. https://www.iconspng.com/image/53292/wizard-silhouette https://www.wpclipart.com/cartoon/mythology/wizard/wizard_angry.png.html
Looks great! I recommend that V1 just have static images side-by-side, and we can punt the specific details around animated images.
I did have a question about how the clipboard interaction would work for pasting rich data into PRs.
Looking forward to see this feature in action. I'd only add that the whole feature should be developed in Python, with a tiny layer of C++ so ASV can invoke the wizard.
Hm, TrackView is being updated to use VP9 WebM, see https://github.com/o3de/o3de/pull/9140. Maybe we can use this to export screenshot comparisons that flip back and forth. Here is an example showing that GitHub supports it (from https://test-videos.co.uk/bigbuckbunny/webm-vp9)
Summary:
AtomSampleViewer's automated screenshot tests are Atom's most exhaustive safety net against bugs and regressions. We want to formalize a process for developers to run these tests before every render-related merge and provide tools that help streamline this process.
What is the relevance of this feature?
There are a number of problems around testing the renderer that we would like to address with improved process and tools. Ultimately these problems stem from the fact that testing render results is hard to automate because hardware and driver differences (and gremlins). We have to keep loose thresholds to avoid false positives, but this increases the risk of false negatives. And so we need humans involved in the testing process on a regular basis, using loose thresholds as a first line of defense, and then use human inspection to identify false negatives.
Feature design description:
Let's create a "wizard" that walks the developer through a series of checks, helps them verify the most important screenshots, and ends with a brief report of the testing that was completed for inclusion in the PR.
Instead of running the automation normally, they will go to the Automation menu and pick "Run pre-commit wizard...". This will run the _fulltestsuite_ script first, then run through a series of screens for the user to interact with...
On the first screen we'll see a summary of the steps that will take place.
(in case you are wondering, the wizard image is Public Domain https://www.wpclipart.com/cartoon/mythology/wizard/wizard_angry.png.html )
After running the full test suite, the wizard will show a summary of the auto-test results. You can click a "See Details" button to open the normal Script Results dialog that we are all used to seeing.
Next the user will be required to inspect and respond to a series of manual screenshot verifications. I'm not sure the exact heuristic we'll use for determining which screenshots to present and how many to present; it's something we can iterate on. Here are some options I have in mind...
The screenshot evaluation screen will have two image swatches. The first will highlight significantly different pixels in red. The second will automatically flip back and forth between the expected baseline screenshot and the captured screenshot. The user can turn off the automatic flipping as needed, but auto-flip is the default. The user can drag and mousewheel to pan and zoom both swatches in sync.
The user must select one of the available options: "I don't see any difference", "I see a benign difference", "I see a difference that's probably benign)", "This looks like a problem".
For each screenshot, we should provide a description of what's important in determining whether certain differences are benign. This will require us to update all our scripts, something we can do gradually. For example, for MaterialHotReloadTest I would say "It's important that the pattern and color match, as well as which words are shown at the bottom." So if there are differences is aliasing or sampling, the user should be able to read this description and pick "I see a benign difference".
We'll also have a button to quickly export artifacts that shows the diff mask and the expected/actual images, so the results can be easily shared. Note this could take several forms: export an animated gif where the expected/actual images flip back and forth, export an animated "apng" (since gif is limited to 256 colors), or just a still image that lines up the expected/actual/diff next to each other. The exact set of artifacts is something we can iterate on.
On the final screen, we'll show a summary of the auto test, a list of any auto failures, a summary of the interactive screenshot inspection, and a list of any issues that were reported. The "Copy to Clipboard" function is what we'll normally use, and paste the output into the PR description. We should also have buttons for re-opening any of the prior screenshot inspection screens, so the user can re-inspect and export anything they missed.
What are the advantages of the feature?
What are the disadvantages of the feature?
We have to trust developers to inspect diffs who might have different opinion or experience about what is a benign difference.
How will this be implemented or integrated into the O3DE environment?
These changes will be made in AtomSampleViewer which lives it its own GitHub repo (https://github.com/o3de/o3de-atom-sampleviewer). Developers must have both O3DE and AtomSampleViewer repo's cloned locally. This is no different from the current operating environment for Atom developers.
Are there any alternatives to this feature?
We could implement something similar in a python tool that runs external to AtomSampleViewer, doing analysis on artifacts captured by AtomSampleViewer after-the-fact. This has the advantage of being able to run ASV on one platform (like mobile) and analyze the results on another platform (developer computer). It might also be easier to develop the UI in PySide Qt rather than ImGui. Disadvantage is that the user might have to jump around more between tools, or it could take longer to develop a streamlined integrated experience. Since we already have built-in systems in place for screenshot analysis, it would probably be easier to just extend those systems with these new features.
How will users learn this feature?
We will need to communicate the general process to the community through a wiki page or official documentation to point them in the right direction. But once they have figured out how to run ASV and open the wizard, then the wizard should guide them the rest of the way.
Are there any open questions?