Open mdethlefs opened 1 year ago
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
I would pay $50 a month for this feature
+1
Hey all! We're thinking about implementing this feature and we need to know what you want :)
All feedback, opinions, ideas, are very much appreciated.
assertVisual
The assertVisual
command requires a name
argument. This name should be the name of the current screen. It takes a screenshot and compares it against a screenshot file in .maestro/reference_screenshot/<name>.png
. If the screenshot doesn't exist, it saves it.
If assertVisual
fails (i.e. the reference screeshot differs enough from the actual screenshot), it saves the ACTUAL screenshot to e.g. ~/.maestro/failed/<name>
– that can later be downloaded for inspection by the user, and the reference screenshot can be easily updated.
- assertVisual:
name: <name>
threshold: <float in 0-1 range> <defaults to env var $MAESTRO_CLI_VISUAL_DIFF_THRESHOLD>
it can also be used in a short form:
- assertVisual: <name>
appId: com.example.example
---
- launchApp
clearState: true
- assertVisual: Login screen # compares the current screen against `.maestro/reference_screenshot/Login screen.png`
- tapOn: Sign in as guest
- assertVisual: Home screen # compares the current screen against `.maestro/reference_screenshot/Home screen.png`
How to make the experience of "accepting the change" usable/enjoyable on CI?
Possible solution: Make it easy to upload "new" screenshots as artifacts on CI (e.g. using actions/upload). Users will download the "new" screenshots and manually update the old ones.
Need to exclude irrelevant system UI such as status bar
Possible solution: get height of status bar and crop the image accordingly
Reference images take up a lot of space? Where to store them?
Using Git LFS feels bad - GitHub's limit is for LFS data transfer is 1GB/user/month. Using an an external file storage (GCS, Amazon S3) seems like too much hassle.
Love these ideas, I would love to add some snapshot testing to our workflows - the where to store/upload and then compare to is an interesting problem though! We started looking in to if there was any way we could generate screenshots throughout our tests, even just for showing other teams and members what flows looked like, but the screenshots only work locally and not on Cloud (for obvious reasons)
My other initial thoughts:
I wonder if converting the image to webp or another format would help with the size issue.
Firstly: WOOO! This looks awesome!
Little stuff:
Bigger stuff:
On your questions:
maestro cloud blah blah
I'd be transferring a lot more each time. Would it necessitate a change in pricing too?@bartekpacia That's great news, thanks a lot for considering this! 🤩
We have an app with dynamic content, so partial screenshot comparison would be much appreciated. Maybe this could at some point even lead to screenshot selectors, e.g. "click on the book cover looking like the image in this file".
One question for the screenshot taken if comparison with golden fails: will it also include diff markers (e.g. pixels which are different are tinted light red)? That would be super useful to quickly spot issues.
hey, thanks a lot for all the feedback!
presumably if this step fails, the test also fails and stops? This will be more of a workflow thing I guess, in that maybe you need separate dedicated snapshot tests vs the existing functional flows (unless your app is very stable and consistent)
assertVisual
will have optional
argument that accepts a bool.
I write tests for a React Native app that works and looks largely the same for both platforms. Is it worth trying to bake in the platform to the path or something, rather than requiring folk implementing tests to write the conditional logic everywhere?
@Fishbowler could you explain what you want in more detail?
I'm hesitant toward that, as it'll quickly increase the complexity of your tests, and make them more flaky and much less portable across different devices.
will it also include diff markers (e.g. pixels which are different are tinted light red)?
This is a very good idea (actually a necessary one to allow for a pleasant workflow)
Regarding visual regression tests:
This is what the Chromatic user interface looks like: https://www.chromatic.com/videos/visual-test-hero.mp4
As you can see, there is also a toggle to display the visual differences more specifically. And there is a button to accept or reject the change. I think there will probably be no getting around a graphical user interface similar to that of Chromatic. Or rather, I have no idea how it could be solved differently.
By the way: If this feature gets implemented and it works as good as Chromatic it would be a complete game changer for many people. It would bring maestro to a whole new level
Partial screenshots would be useful for isolating specific components.
Platform in path
I'd like to be able to check that the login screen in Android looks like it did before. I'd like to be able to check that the login screen in iOS looks like it did before. They're close, but not close enough - Native components and whatnot. I don't want lots of conditional logic in my tests, I want assertVisual
to take care of it all for me.
Partial Screenshots
I'd like to be able to care about what I care about and not need to always set consistent data to make the screenshots match (like data that would come from an API). Hierarchy for a selector would give good coordinates that would likely remain mostly consistent for the same app running on the same device.
Hey all,
Thanks a lot for all the valuable feedback. It means a lot to us and helps us build what you need.
It's clear that this feature holds a lot of value. That said, there's an inherent problems with it: having to manually maintain the baseline. The larger the app, the more time consuming it'll get.
TL:DR We want to give you advantages of assertVisual
without the need to maintain the baseline screenshots.
Based on our experience building App Quality Copilot, we're quite confident that this can actually work and be useful.
Here's how we envision it:
- tapOn: Get started
- assertVisualAI:
assertion: "Assert that login button is visible with multi factor authentication of OTP"
optional: <bool> # if `true`, it'll be a warning, not a failure
assertVisualAI
will not be the replacement of assertVisual
. If what you want to do is compare screenshots pixel by pixel – sure, Maestro should let you do it, and we'll build this feature.
If you don't want the burden of maintaining baseline screenshots though, but still want some assurance that your screens "look right", we want to make it possible (and easy). In particular, assertVisualAI
could catch the following categories of issues:
assertion
is falseActually, the prompt
argument would be optional - you could just call assertVisualAI
and still get the validations above.
We will also provide a way to improve AI responses by flagging false positives.
We'd like to surface responses you get from AI in Maestro Studio, to make experience smoother. Of course, at the same time we'd make sure it works equally well in CLI-only mode.
We don't want to force any specific AI model. There'd be configuration in config.yaml
so you could select between OpenAI's GPT 4o, GPT 4o-mini, Claude Sonnet, or even some locally running model
We have many ideas around this. One of the is taking some existing model and finetuning it to perform even better for exactly this kind of task – quality assurance of mobile app UI.
Overall – what do you think? We'd love to get your thoughts on this.
I think it's a cool option to have available, but right now I'd rather maintain my baseline and retain my deterministic tests.
Caveat: I've not played with the App Quality Copilot at all.
Slightly O/T:
My experience in testing AI systems (as opposed to using them to help me test) has given me a strong scepticism that the generative models can be relied upon for consistency of output, which in my current context (healthcare) means I can't rely on it for testing evidence.
Something like this would be a great start from my perspective and perhaps it can use deeplinks or universal links so you can just specify a bunch of paths and getting screenshots.
https://docs.fastlane.tools/img/getting-started/ios/htmlPagePreviewFade.jpg
Can add more specific component testing after. Just would like to look at my overall layout on a bunch of different simulators first.
For comparison and inspiration, We use Playwright for our web testing, and their screenshot and visual comparison features are pretty comprehensive https://playwright.dev/docs/test-snapshots, and when diff is detected, it attaches the expected, actual, and diff image to it's test report to help easy browsing of the changes; It supports two different approaches to deal with dynamic contents.
1, It allows taking screenshots of a component/element only, using any of its supported selector like css/testid/etc, allowing you to only capture image for something like the header, footer, navigation bar, center area, popup; which is useful for large organizations that each team probably only owns and maintain a section on the page,
2, it allows applying masks to certain area/element of the screen, like the date-time field, or anything that cannot be made static/consistent across each test runs. The masked area will be covered by a solid color block in the screen shot, so image diff will effectively ignore those masked areas;
One quirk we did have to deal with is how it treats new screenshots. Out of the box it always consider new screenshots as test failures, since there's no expected image to compare with, but it will save these new screenshots as expected images, so when you re-run the same tests again, tests will pass; this is probably ok for projects that are already stable and visual changes rarely happen; but for fast evolving projects, or projects with frequent new features, this isn't very friendly.
Luckily it was written in Typescript and whether it's intended or not, it exposes quite a lot of the internals for us to tweak its behavior; we were able to tweak/hack it that when it's a new screenshot with no expected screenshot in place, we will catch the error thrown, and mark the test status as "skipped" as opposed to "failed", and attach the new screenshot to test report as well, so in the report, all tests that generates new screenshots will be grouped as skipped, allowing us to browse them separately,from the true failed or passed tests. And we have automation following a PR merge to simply refresh all screenshots, so these new screenshots will become expected ones in next test run.
Hope this gives some perspective and inspiration, that we can have some similar or even better for mobile/reactnative.
Regarding use of AI without needing expected images, I think it's valuable, but for very different use cases than pixel-by-pixel comparisons, and two aren't mutually exclusive, that there are cases both are valuable.
I can imagine for small or in development projects, such AI can be a convenient way to detect things that are obviously off and broken; but as the project grows and become more serious, I think eventually we will need something more definitive. Not sure AI will be smart enough to tell something like "This otherwise perfectly looking button should be green instead of red", this is where we need image diff with human approved baseline images.
On the other hand, AI can still supplement the image diff tests, since there's always a chance of human mistakes that the baseline images can be messed up, and AI can be the second layer of checks to catch those.
We're currently working on setting up Maestro for our E2E testing and the assertVisual
command looks like a perfect way to also get visual regression test coverage.
Looking at the comment above about possible problems, a thought about the "accepting the change" piece. I think having the CI version be informational and not worry about being able to accept changes would be fine, especially for an early version of the command.
I see a draft PR exists to add assertVisual
- do we have any idea when a version of the command will be available for use?
Is your feature request related to a problem? Please describe. It always takes a lot of time to check if certain components looks different/wrong. Tools like Chromatics visual regression testing help a lot with that. They take a screenshot of a component and with every build a new screenshots is taken and being compared against the old one.
Describe the solution you'd like
- assertVisual: xyz
- which creates a screenshot the first time it is used. the xyz is a keyword so you can compare multiple screenshots and flows