Failing Tests due to height rendering issue when recording on M1 and testing on Linux

matthew-shin-hs commented 1 month ago

Describe the bug

Recording screenshots from a local M1 laptop using arm based architecture emulator and then running the tests on a x86_64 emulator in a linux VM environment with the baseline recorded from the M1 chip is causing failures and doing a diff on them seems to show that there is a height difference in the screenshots.

I have gone through multiple steps to see how I can remediate these differences in addition to the the basics like making sure the emulators are the same device running the same target etc, I was making sure both environments are using hardware acceleration, to even just forcing both of them to use software rendering (which didn't work but even if it did, the time consumption to run these tests would not be sustainable in the long run).

The interesting part is that prior to this migration to linux VM, we were using a Mac Pro Intel chip VM which had the same architecture (but a different device Pixel 3a installed) which was working fine when recording locally on a M1 device and testing them on the Mac Pro Intel Chip VM AVD. We were forced to migrate to Linux because of the CI/CD pipeline provider we are with sunset intel chip devices.

This feedback relates to:

[X] The Kotlin library
[ ] The Gradle plugin
[ ] The IntelliJ Platform plugin
[ ] The sample code
[ ] The documentation

To Reproduce Steps to reproduce the behavior:

record a screenshot on a M1 device using an emulator
run the screenshot on a Linux VM on the same emulator device (other than the architecture)
observe tests failing due to height rendering issue

Expected behavior

Expect outputs to be the same height on the screenshots and thus pass tests

Screenshots Below is a screenshot of how the diff looks like in the Linux VM. It seems like it's just the height of the images.

Desktop (please complete the following information):

Local:

OS: macOS Sonoma
Version 14.7

VM:

OS: Linux
Version: Ubuntu 20.04.2 LTS

Target Android Device (please complete the following information):

Device: Pixel 4
- Physical or Virtual: AVD
API Level: 30
- Testify Key: 30-1080x2280@440dp-en_US

DanielJette commented 1 month ago

Hi @matthew-shin-hs Thanks for the detailed bug report. This is very interesting! I'm sorry you're having trouble with Testify using this setup. Much of the development of Testify has been done on an M1 and we use a Linux VM for our CI on Bitrise. So, this is certainly a supported configuration and should work.

A couple of questions for you:

Are your screens primarily Android View-based, or Compose? Is there a mix of both?
Are you using any Testify extensions? For example, the Jetpack Compose Extensions?
Are you using the original ScreenshotRule or the new ScreenshotScenarioRule?
Are you using an exactness configuration with your tests?
Do all of your tests fail, or are any passing at all?
Can you please try adding TestifyFeatures.GenerateDiffs.setEnabled(true) to one of your tests and capture the high-contrast diff? It will highlight every pixel that fails the comparison.
Can you please attach original PNGs to this bug, from an M1 and another from the Linux VM?

matthew-shin-hs commented 1 month ago

Hey @DanielJette thanks always for such a quick response and also super interesting that you guys also use Linux machines on CI without an issue. I'll answer your questions but I'm on vacation from Monday to Thursday next week so you probably won't hear back from me until next week. In the meantime here are my responses:

Are your screens primarily Android View-based, or Compose? Is there a mix of both?

We have a combination of both.

Are you using any Testify extensions? For example, the Jetpack Compose Extensions?

Yes, I believe constraint layout compose is an extension library?? We have multi modules and the tests span across a lot of them so it's hard to track all but I'm pretty confident at least some of them would use it.

Are you using the original ScreenshotRule or the new ScreenshotScenarioRule?

We use the ScreenshotRule and have custom classes that extend this to test for larger fonts for accessibility as well as dark mode.

Are you using an exactness configuration with your tests?

Yes, we use 0.85 exactness as we've had some issues in the past which you actually helped resolve

Do all of your tests fail, or are any passing at all?

So our CI stops at the first module that has failing tests and in that specific module we have 58 tests of which 21 of them fails.

Can you please try adding TestifyFeatures.GenerateDiffs.setEnabled(true) to one of your tests and capture the high-contrast diff? It will highlight every pixel that fails the comparison.

Yes here are the 4 that are resulted from that. LeaderboardViewTest_emptyLeaderboard diff LeaderboardViewTest_highlightCurrentUser diff

Can you please attach original PNGs to this bug, from an M1 and another from the Linux VM?

These are the 4 tests from the high-contrast diff above.

From M1: LeaderboardViewTest_emptyLeaderboard LeaderboardViewTest_highlightCurrentUser

From Linux: LeaderboardViewTest_emptyLeaderboard LeaderboardViewTest_highlightCurrentUser \

thanks for taking the time to look into this

DanielJette commented 1 month ago

I have a theory on what's going on here. I suspect that the Android OS you're running on the Linux machine may have a different navigation configuration than the one you're running on the M1.

If your app is using any sort of fitSystemWindows, edge-to-edge or WindowInsets, then the type of system navigation configuration will impact the visible area available to your Activity and change the height of your captured images.

One way this could happen is if you're running your tests with different system images of Android. For example, if you're running the AOSP image, there might be a different default navigation configuration. Similarly, the Google Play and Google API images might each also come preconfigured with different navigation configurations. Testify does not key against the system image, nor does it key against the navigation options.

I experimented with a simple test using both 3-button and gesture nav options and found the following results:

You can see that the gesture nav allows for a body height of +88px compared to the 3-button version.

More importantly you can see how the content area sizes match exactly with the linux height of 1977px and m1 at 2065p in your provided samples.

So, my conclusion is that the linux AVD is using 3-button navigation, but the M1 is using gesture navigation.

My suggestion would be to verify the navigation configuration on each environment. You can script the navigation configuration, if necessary using these commands:

Enable 3-button navigation and disable gesture navigation:

adb shell cmd overlay enable com.android.internal.systemui.navbar.threebutton
adb shell cmd overlay disable com.android.internal.systemui.navbar.gestural

Enable gesture navigation and disable 3-button navigation:

adb shell cmd overlay enable com.android.internal.systemui.navbar.gestural
adb shell cmd overlay disable com.android.internal.systemui.navbar.threebutton

@matthew-shin-hs Can you test this out and let me know if that works for you?

matthew-shin-hs commented 4 weeks ago

Hi @DanielJette, thanks for your detailed explanation. Upon just a quick glance, running both simulators on both machines I see that they are both showing up as a 3-buttonavigatin, but as a sanity check I will try the commands you sent to see if it helps.

Also the image I have I had made sure they were the same in both environments (Google API Playstore) other than the architecture they run on

matthew-shin-hs commented 4 weeks ago

@DanielJette and as I suspected, trying both of those scripts still result in the same outcome unfortunately. I also did confirm that going into the settings for both devices show the same 3-button navigation when recording/testing the screenshots.

Did you have any other leads?

sergio-sastre commented 3 weeks ago

@matthew-shin-hs If you are inflating an Activity or View, could be that the UI is still not Idle at the time the screenshot is taken? I remember facing some issues like that in the past with other instrumentation testing libraries.

The solution was to do getInstrumentation().waitForIdleSync() before the screenshot.

Not sure if it'd solve the issues since I'm aware that AndroidTestify uses some sync mchanism to wait for the Activity to be resumed at least... but worth a try?

DanielJette commented 3 weeks ago

I'm currently traveling and without my laptop, so I can't provide a more expressive example at the moment, but my recommendation would be to try a couple of debugging techniques to see if we can get to the bottom of this.

My suggestion would be to swap one of the failing tests with a Full screen capture:

https://ndtp.github.io/android-testify/docs/extensions/fullscreen/test

By taking a full screen capture, you'll be able to see everything that is loaded on the device that Testify "sees" when making a screenshot.

Hopefully running this on both devices can narrow down the cause of the size differences.

Note that I do not recommend using full screen capture for all your tests as it is significantly slower. This would be for debugging purposes only

matthew-shin-hs commented 3 weeks ago

Just to give you an update, I noticed that the three-button navigation may have worked. Enabling the gestural method and disabling the three-button method seem to pass the failing tests that were used in these examples but I have been faced with a different issue:

INSTRUMENTATION_RESULT: longMsg=Input dispatching timed out ... (server) is not responding ... something along these lines.

Hopefully this is an unrelated issue and if it is, then your solution about the differences in navigation would have been the problem!

matthew-shin-hs commented 3 weeks ago

@DanielJette I'm a little bit back to square 0 as the previous suggestion didn't seem to help actually. I did have a question for you though, you mentioned that you also use a Linux machine on your CI/CD while developing on a M1, and I'm wondering if your Linux VM emulator is running hardware acceleration or software.

I did a bit more digging by going into the VM and the emulator settings seems to set the gpu mode to software. I've tried manually switching it to hardware by emulator -avd emulator -gpu host but I get some errors on the VM that I wasn't able to solve yet. I also tried recording the screenshots on my M1 through software rendering but this didn't seem to be helpful (I'm assuming the architecture differences do matter in this case).

DanielJette commented 3 weeks ago

The CI for Testify itself is using a Linux image (linux-docker-android-22.04) and you can see our setup here: https://github.com/ndtp/android-testify/blob/main/bitrise.yml#L98-L106

We're using -gpu swiftshader_indirect

DanielJette commented 3 weeks ago

Also @matthew-shin-hs did you try running a test with Fullscreen capture? I think comparing the results from your M1 vs Linux could be illuminating

matthew-shin-hs commented 3 weeks ago

@DanielJette Yes I did try with full screen for the failing ones this is the diff for one of those tests. From what I can see, there is no difference between doing a full screen capture vs what we already capture in the baseline

DanielJette commented 3 weeks ago

@matthew-shin-hs That doesn't appear correct. The fullscreen capture should include the system ui (status bar and navigation). For example: https://github.com/ndtp/android-testify/blob/main/Samples/Legacy/src/androidTest/assets/screenshots/29-1080x2220%40440dp-en_US/FullscreenCaptureExampleTest_fullscreen.png

For every test you would like to use the fullscreen capture:

Add androidTestImplementation "dev.testify:testify-fullscreen:3.2.0" to your app's dependencies
Invoke .captureFullscreen() on the rule before you call assertSame()
Invoke .excludeSystemUi() to ignore the system ui in the diff. This will still capture the system ui, but ignore it when comparing baseline images.

You need to call captureFullscreen() in every test as this value is reset after each test method.

Like this:

class FullscreenCaptureTest {

    @get:Rule
    var rule = ScreenshotRule(MainActivity::class.java)

    @ScreenshotInstrumentation
    @Test
    fun fullscreen() {
        rule
            .captureFullscreen()    // Set the fullscreen capture method
            .excludeSystemUi()      // Exclude the navigation bar and status bar areas from the comparison
            .setExactness(0.95f)    // Allow a 5% variation in color
            .assertSame()
    }
}

matthew-shin-hs commented 3 weeks ago

@DanielJette is .excludeSystemUi() part of the same full-screen library? because I can get captureFullscreen() method but not able to import .excludeSystemUi() as it doesn't recognize it

DanielJette commented 3 weeks ago

Shoot. I'm sorry. I need to update the docs. You're right. This was moved as a configuration method. The correct way to set this now is:

    @ScreenshotInstrumentation
    @Test
    fun fullscreen() {
        rule
            .configure {
                captureMethod = ::fullscreenCapture // Set the fullscreen capture method
                exactness = 0.95f // Allow a 5% variation in color
                excludeSystemUi() // Exclude the navigation bar and status bar areas from the comparison
            }
            .assertSame()
    }

matthew-shin-hs commented 3 weeks ago

@DanielJette Gotcha, this is what one of the failing tests show in full screen capture along with the excludeSystemUI(), hope it is a bit more insightful in debugging this 🙁

matthew-shin-hs commented 3 weeks ago

@DanielJette so as part of further troubleshooting, I tried taking a screenshot of the same screen in the settings on both machines and they seem to have the same height differences as the issues we have seen with our tests. I presume this is is because of the padding from the status bar. But this is odd because the tests I wrote and the initial screenshots I shared with you don't have the status bars or navigation bars

Is it possible that it is somehow related to how the ComposableTestActivity is set up and how it may be affecting the rendering heights on the different machines? Still not really making sense of how your CI/CD doesn't cause problems but mine does. The key would be finding where that difference is...

M1: display

Linux: Screenshot_1731105159

DanielJette commented 2 weeks ago

The ComposableTestActivity will adjust the content view to account for the status bar, navigation bar, and toolbar.

You can see in this example I generated below that the toolbar area is in red, the status bar in blue, the navigation bar in black and the content area in white/grey:

So, if the Linux device you're running on has expanded toolbar areas, then this will be reflected in the content area. I have no explanation for why the toolbar would be double-height though. I've never seen that before.

DanielJette commented 2 weeks ago

Are you using the original ScreenshotRule or the new ScreenshotScenarioRule?

We use the ScreenshotRule and have custom classes that extend this to test for larger fonts for accessibility as well as dark mode.

I did want to clarify this point though. If you're using the raw ScreenshotRule (and not one of the subclasses like ComposableScreenshotRule), then you can provide your own test harness activity.

For example, you could create your own test harness activity like the TestLocaleHarnessActivity that we provide with the sample apps.

If you provide your own Activity, then you can customize the content area so that it loads your views in the way you desire, perhaps avoiding the problems with the toolbar by omitting one from the activity

DanielJette commented 2 weeks ago

Another idea that you can try would be to implement setScreenshotViewProvider() in your test to narrow down what is captured by the screenshot test: https://ndtp.github.io/android-testify/docs/recipes/view-provider

Though, that doesn't seem to be your problem as it appears that the content view is smaller, so this might not help.

matthew-shin-hs commented 2 weeks ago

@DanielJette Hmm yeah the problem with having custom heights is that certain views are small and we don't want it to take the entire activity, only what it needs otherwise it ends up looking weird.

The ComposableTestActivity will adjust the content view to account for the status bar, navigation bar, and toolbar.

We use a custom activity that just overrides the onCreate method of the ComposableTestActivity and put this.supportRequestWindowFeature(Window.FEATURE_NO_TITLE) before calling super. With that, even when you hide the navigation bar, the content view still takes into account the navigation bar space??

DanielJette commented 2 weeks ago

@matthew-shin-hs No, testify doesn't impose any sizing restrictions on the Activity. It's the other way around. Testify can only capture content within the bounds of the content view which we normally obtain via findViewById<ViewGroup>(android.R.id.content).

My understanding of how Android sizes the content view is that is uses the current theme associated with the Activity. I believe that the default theme may include an actionbar by default. Even if not visible, it would mean that the contentview is sized to account for the possibility of an action bar.

Perhaps you can override the theme on your test harness Activity with a NoActionBar theme? Something like android:theme="@style/AppTheme.NoActionBar" or android:Theme.Material.Light.NoActionBar ?

matthew-shin-hs commented 1 week ago

Thanks @DanielJette for your help with this. Since our last conversation our CI/CD provider Codemagic has released an emulator with API 34 and running tests against that emulator seem to work fine, so it seems like it was just a weird issue with that specific emulator on that VM machine. So I will close this issue.

Thank you again for your time!

ndtp / android-testify

Failing Tests due to height rendering issue when recording on M1 and testing on Linux #237