[Feature Request] Visual Regression Testing

mdethlefs commented 1 year ago

Is your feature request related to a problem? Please describe. It always takes a lot of time to check if certain components looks different/wrong. Tools like Chromatics visual regression testing help a lot with that. They take a screenshot of a component and with every build a new screenshots is taken and being compared against the old one.

Describe the solution you'd like

Maybe there should be a command like - assertVisual: xyz - which creates a screenshot the first time it is used. the xyz is a keyword so you can compare multiple screenshots and flows
When the command is used again maestro will check if a screenshot is already created and check if there are visual differences.
- if there is a difference the flow fails.
- you would have to manually check if the visual difference is ok or not
- if it is ok and wanted you can accept the new screenshot as the news "master"-screenshot. flows will compare against that screenshot in the future

otoniel-isidoro commented 1 year ago

+1

kassemitani commented 1 year ago

+1

milesingrams commented 11 months ago

+1

mrgklwong commented 9 months ago

+1

news-roccosalvetti commented 8 months ago

+1

Rohphi commented 7 months ago

+1

nabilfreeman commented 5 months ago

+1

radhakrishnanakireddy commented 5 months ago

+1

daniel-anh-nguyen-hipages commented 4 months ago

+1

paulsweeting commented 4 months ago

+1

nabilfreeman commented 4 months ago

I would pay $50 a month for this feature

Stackustack commented 4 months ago

+1

chriszs commented 3 months ago

Much of the logic for the most basic version of this feature already exists in the code base. Maestro does screenshot image comparison using a percentage change threshold in two places to power other features.

bartekpacia commented 1 month ago

Hey all! We're thinking about implementing this feature and we need to know what you want :)

All feedback, opinions, ideas, are very much appreciated.

New command - `assertVisual`

The assertVisual command requires a name argument. This name should be the name of the current screen. It takes a screenshot and compares it against a screenshot file in .maestro/reference_screenshot/<name>.png. If the screenshot doesn't exist, it saves it.

If assertVisual fails (i.e. the reference screeshot differs enough from the actual screenshot), it saves the ACTUAL screenshot to e.g. ~/.maestro/failed/<name> – that can later be downloaded for inspection by the user, and the reference screenshot can be easily updated.

- assertVisual:
    name: <name>
    threshold: <float in 0-1 range> <defaults to env var $MAESTRO_CLI_VISUAL_DIFF_THRESHOLD>

it can also be used in a short form:

- assertVisual: <name>

Example usage

appId: com.example.example
---
- launchApp
    clearState: true
- assertVisual: Login screen # compares the current screen against `.maestro/reference_screenshot/Login screen.png`
- tapOn: Sign in as guest
- assertVisual: Home screen # compares the current screen against `.maestro/reference_screenshot/Home screen.png`

Problems

How to make the experience of "accepting the change" usable/enjoyable on CI?

Possible solution: Make it easy to upload "new" screenshots as artifacts on CI (e.g. using actions/upload). Users will download the "new" screenshots and manually update the old ones.
Need to exclude irrelevant system UI such as status bar

Possible solution: get height of status bar and crop the image accordingly
Reference images take up a lot of space? Where to store them?

Using Git LFS feels bad - GitHub's limit is for LFS data transfer is 1GB/user/month. Using an an external file storage (GCS, Amazon S3) seems like too much hassle.

simon-gilmurray commented 1 month ago

Love these ideas, I would love to add some snapshot testing to our workflows - the where to store/upload and then compare to is an interesting problem though! We started looking in to if there was any way we could generate screenshots throughout our tests, even just for showing other teams and members what flows looked like, but the screenshots only work locally and not on Cloud (for obvious reasons)

Is there any way we can integrate this with Cloud though, for teams who use it in CI?

My other initial thoughts:

excluding the status bar seems like a sensible approach
probably more of an education piece around guidelines on how the threshold will work with examples
presumably if this step fails, the test also fails and stops? This will be more of a workflow thing I guess, in that maybe you need separate dedicated snapshot tests vs the existing functional flows (unless your app is very stable and consistent)

chriszs commented 1 month ago

I wonder if converting the image to webp or another format would help with the size issue.

Fishbowler commented 1 month ago

Firstly: WOOO! This looks awesome!

Little stuff:

Worth considering a selector option for a partial screenshot?
I write tests for a React Native app that works and looks largely the same for both platforms. Is it worth trying to bake in the platform to the path or something, rather than requiring folk implementing tests to write the conditional logic everywhere?

Bigger stuff:

Tests should be runnable in cloud and locally. Is there stuff beyond status bars to consider to keep that capability? Some ability to create a better representation emulator/simulator? Or skipping the step if the local screen size doesn't match and there's an indicator for cloud use in the config.yaml? Lots of possibilities I've not thought of.

On your questions:

Implementing "Accepting the change" - what's missing from the current suggestion? You're already giving back the new golden, and can let folk manage their files as they need to let the next pass succeed. The real issue is in clear messaging. There's possibly a different workflow here between "Golden Missing" (and so give the Cloud user their new golden images) versus "Golden Failed" (where an intentionally changed UI causes a test to fail expectedly).
"Where to store the files" looks like it could balloon this feature to switching the Cloud model from the existing one of "an app file and a workspace" to "Real integration with the repositories". The description already suggests giving the user back their "failed" screenshots, so that they can reconcile. Maybe it's enough to let the user deal with storage? Or is this also about transfer bandwidth? When I maestro cloud blah blah I'd be transferring a lot more each time. Would it necessitate a change in pricing too?

ubuntudroid commented 1 month ago

@bartekpacia That's great news, thanks a lot for considering this! 🤩

We have an app with dynamic content, so partial screenshot comparison would be much appreciated. Maybe this could at some point even lead to screenshot selectors, e.g. "click on the book cover looking like the image in this file".

One question for the screenshot taken if comparison with golden fails: will it also include diff markers (e.g. pixels which are different are tinted light red)? That would be super useful to quickly spot issues.

bartekpacia commented 1 month ago

hey, thanks a lot for all the feedback!

optional failures

presumably if this step fails, the test also fails and stops? This will be more of a workflow thing I guess, in that maybe you need separate dedicated snapshot tests vs the existing functional flows (unless your app is very stable and consistent)

assertVisual will have optional argument that accepts a bool.

Platform in path

I write tests for a React Native app that works and looks largely the same for both platforms. Is it worth trying to bake in the platform to the path or something, rather than requiring folk implementing tests to write the conditional logic everywhere?

@Fishbowler could you explain what you want in more detail?

Partial screenshots

I'm hesitant toward that, as it'll quickly increase the complexity of your tests, and make them more flaky and much less portable across different devices.

Diff markers

will it also include diff markers (e.g. pixels which are different are tinted light red)?

This is a very good idea (actually a necessary one to allow for a pleasant workflow)

mdethlefs commented 1 month ago

Regarding visual regression tests:

This is what the Chromatic user interface looks like: https://www.chromatic.com/videos/visual-test-hero.mp4

As you can see, there is also a toggle to display the visual differences more specifically. And there is a button to accept or reject the change. I think there will probably be no getting around a graphical user interface similar to that of Chromatic. Or rather, I have no idea how it could be solved differently.

By the way: If this feature gets implemented and it works as good as Chromatic it would be a complete game changer for many people. It would bring maestro to a whole new level

chriszs commented 1 month ago

Partial screenshots would be useful for isolating specific components.

Fishbowler commented 1 month ago

Platform in path

I'd like to be able to check that the login screen in Android looks like it did before. I'd like to be able to check that the login screen in iOS looks like it did before. They're close, but not close enough - Native components and whatnot. I don't want lots of conditional logic in my tests, I want assertVisual to take care of it all for me.

Partial Screenshots

I'd like to be able to care about what I care about and not need to always set consistent data to make the screenshots match (like data that would come from an API). Hierarchy for a selector would give good coordinates that would likely remain mostly consistent for the same app running on the same device.

bartekpacia commented 1 month ago

Hey all,

Thanks a lot for all the valuable feedback. It means a lot to us and helps us build what you need.

It's clear that this feature holds a lot of value. That said, there's an inherent problems with it: having to manually maintain the baseline. The larger the app, the more time consuming it'll get.

Proposal

TL:DR We want to give you advantages of assertVisual without the need to maintain the baseline screenshots.

Based on our experience building App Quality Copilot, we're quite confident that this can actually work and be useful.

Here's how we envision it:

- tapOn: Get started
- assertVisualAI:
    assertion: "Assert that login button is visible with multi factor authentication of OTP"
    optional: <bool> # if `true`, it'll be a warning, not a failure

assertVisualAI will not be the replacement of assertVisual. If what you want to do is compare screenshots pixel by pixel – sure, Maestro should let you do it, and we'll build this feature.

If you don't want the burden of maintaining baseline screenshots though, but still want some assurance that your screens "look right", we want to make it possible (and easy). In particular, assertVisualAI could catch the following categories of issues:

The assertion is false
text/views are cropped or overlapping
obvious localization probles

Actually, the prompt argument would be optional - you could just call assertVisualAI and still get the validations above.

We will also provide a way to improve AI responses by flagging false positives.

Maestro Studio integration

We'd like to surface responses you get from AI in Maestro Studio, to make experience smoother. Of course, at the same time we'd make sure it works equally well in CLI-only mode.

Model selection

We don't want to force any specific AI model. There'd be configuration in config.yaml so you could select between OpenAI's GPT 4o, GPT 4o-mini, Claude Sonnet, or even some locally running model

The future

We have many ideas around this. One of the is taking some existing model and finetuning it to perform even better for exactly this kind of task – quality assurance of mobile app UI.

Overall – what do you think? We'd love to get your thoughts on this.

Fishbowler commented 1 month ago

I think it's a cool option to have available, but right now I'd rather maintain my baseline and retain my deterministic tests.

Caveat: I've not played with the App Quality Copilot at all.

Slightly O/T:

My experience in testing AI systems (as opposed to using them to help me test) has given me a strong scepticism that the generative models can be relied upon for consistency of output, which in my current context (healthcare) means I can't rely on it for testing evidence.

samducker commented 1 month ago

Something like this would be a great start from my perspective and perhaps it can use deeplinks or universal links so you can just specify a bunch of paths and getting screenshots.

https://docs.fastlane.tools/img/getting-started/ios/htmlPagePreviewFade.jpg

Can add more specific component testing after. Just would like to look at my overall layout on a bunch of different simulators first.

dachaoli-finfare commented 1 week ago

For comparison and inspiration, We use Playwright for our web testing, and their screenshot and visual comparison features are pretty comprehensive https://playwright.dev/docs/test-snapshots, and when diff is detected, it attaches the expected, actual, and diff image to it's test report to help easy browsing of the changes; It supports two different approaches to deal with dynamic contents.

1, It allows taking screenshots of a component/element only, using any of its supported selector like css/testid/etc, allowing you to only capture image for something like the header, footer, navigation bar, center area, popup; which is useful for large organizations that each team probably only owns and maintain a section on the page,

2, it allows applying masks to certain area/element of the screen, like the date-time field, or anything that cannot be made static/consistent across each test runs. The masked area will be covered by a solid color block in the screen shot, so image diff will effectively ignore those masked areas;

One quirk we did have to deal with is how it treats new screenshots. Out of the box it always consider new screenshots as test failures, since there's no expected image to compare with, but it will save these new screenshots as expected images, so when you re-run the same tests again, tests will pass; this is probably ok for projects that are already stable and visual changes rarely happen; but for fast evolving projects, or projects with frequent new features, this isn't very friendly.

Luckily it was written in Typescript and whether it's intended or not, it exposes quite a lot of the internals for us to tweak its behavior; we were able to tweak/hack it that when it's a new screenshot with no expected screenshot in place, we will catch the error thrown, and mark the test status as "skipped" as opposed to "failed", and attach the new screenshot to test report as well, so in the report, all tests that generates new screenshots will be grouped as skipped, allowing us to browse them separately,from the true failed or passed tests. And we have automation following a PR merge to simply refresh all screenshots, so these new screenshots will become expected ones in next test run.

Hope this gives some perspective and inspiration, that we can have some similar or even better for mobile/reactnative.

dachaoli-finfare commented 1 week ago

Regarding use of AI without needing expected images, I think it's valuable, but for very different use cases than pixel-by-pixel comparisons, and two aren't mutually exclusive, that there are cases both are valuable.

I can imagine for small or in development projects, such AI can be a convenient way to detect things that are obviously off and broken; but as the project grows and become more serious, I think eventually we will need something more definitive. Not sure AI will be smart enough to tell something like "This otherwise perfectly looking button should be green instead of red", this is where we need image diff with human approved baseline images.

On the other hand, AI can still supplement the image diff tests, since there's always a chance of human mistakes that the baseline images can be messed up, and AI can be the second layer of checks to catch those.

MFazio23 commented 4 hours ago

We're currently working on setting up Maestro for our E2E testing and the assertVisual command looks like a perfect way to also get visual regression test coverage.

Looking at the comment above about possible problems, a thought about the "accepting the change" piece. I think having the CI version be informational and not worry about being able to accept changes would be fine, especially for an early version of the command.

I see a draft PR exists to add assertVisual - do we have any idea when a version of the command will be available for use?

mobile-dev-inc / maestro