Continuous (Rendering) Benchmarking

louwers commented 1 year ago

The iOS and Android rendering benchmarks should be run on CI.

Like all our instrumented tests we will run them on AWS Device Farm. I want to have to use one less powerful device, a modern mid-tier device and modern top-tier device. Very old devices are not available, I selected these (we can always expand / change them later). For Android:

Samsung Galaxy J7 (2018)
Samsung Galaxy A34 (2023)
Google Pixel 7 Pro (2022)

For iOS:

iPhone 6s (2015)
Apple iPhone SE (2022)
Apple iPhone 14 Pro Max (2022)

HTTP API

The current instrumentation tests just poll AWS Device Farm until the test has completed. We do not pull any data from the device and I could not get this functionality to work thus far. For the benchmarks it is essential that we get the results from the device. I want to set up a simple database-backed HTTP API where the device sends the results after the benchmark is completed. This has the added benefit that we have a place to track results over time.

Workflow

The new benchmarking tests will be added to the existing workflows. The PR number, git SHA and HTTP API endpoint (with auth info) should be baked into the test app at build time.

After the normal workflows have completed a check with a button is created (called a requested action) that allows kicking off the benchmark workflow.

The benchmark workflow will retrieve the previously built tests that were stored as artifacts. It will create a new check to indicate the benchmarks are running and send them off to AWS Device Farm. After the benchmarks have completed the workflow will update the status of the check from running to passed (if the benchmark ran through) or failed (if something went wrong). The device will send the results to the HTTP API with a JSON payload that will look something like this:

{
  "platform": "android",
  "deviceInfo": {
     "manufacturer": "Samsung",
     "model": "..."
  },
  "benchmarkResults": {
    "maptilerBasic": {
      "avgFps": 123.0,
      "avgFrameEncodingTime": 123.0,
      "low1pFps": 60.0,
      "avgFps": 120.0
    },
    "facebook": {
      "avgFps": 123.0,
      "avgFrameEncodingTime": 123.0,
      "low1pFps": 60.0,
      "avgFps": 120.0
    }
  },
  "sha": "a858e7b4dedcf04aa6e011663e4db774631764c9",
  "prNumber": "222"
}

So: the benchmarking results (split out per style), information about the device and information about the related git commit.

When the HTTP API receives a result, it updates (or creates) the comment on the PR with the results. I can also add a comparison with main. It might be helpful to look up the results of the most recent parent commit already part of main instead, but just comparing with the latest in main is OK at first.

Cost-allowing we might want to run the benchmarks for every PR and add a failed check if a significant performance regression is determined.

Android

[x] Write benchmark
[x] Create Instrumentation test that runs benchmark
[x] Add code for serializing results and making HTTP request
[x] Build instrumentation test on CI

iOS

[x] Write benchmark (Alex already wrote this, maybe add style switching)
[ ] Create XCTest that runs benchmark
[ ] Add code for serializing results and making HTTP request
[ ] Build instrumentation test on CI

GitHub Actions

[x] Create new workflow for running benchmark
[x] Create requested action check on CI for running benchmark manually

HTTP API

[x] Write and deploy simple HTTP API that just persists results sent to some endpoint
[ ] Give it permissions to leave comments on GitHub PRs
[ ] Add code for generating comment and making comment

louwers commented 1 year ago

Simple AWS Lambda for collecting results set up (https://github.com/maplibre/ci-runners/pull/12). Source here: https://github.com/maplibre/ci-runners/tree/main/mln-lambda

I want to add the rendering benchmark to the Android instrumentation tests.

AWS Device Farm allows setting a test filter, I think this syntax is supported. https://developer.android.com/reference/androidx/test/runner/AndroidJUnitRunner#execution-options:

When the UI tests are run the benchmark should be excluded, when the benchmark is run only the benchmark test should be run.

louwers commented 12 months ago

I don't want to add the needed secrets to the app at build time, because then unprivileged workflows would need access to them, or I would need to build in a priveleged workflow.

There is a way to specify extra data when running the test on AWS Device Farm, but it doesn't seem to work. That is, this doesn't print anything:

val appContext = InstrumentationRegistry.getInstrumentation().targetContext
val externalFiles = appContext.getExternalFilesDir(null)
externalFiles?.listFiles()?.forEach {
    println("listFiles, externalFilesDir: $it")
}
appContext.filesDir.listFiles()?.forEach {
    println("listFiles, filesDir: $it")
}

I suspect that AWS Device Farm has not been updated to work with scoped storage yet. https://docs.aws.amazon.com/devicefarm/latest/developerguide/how-to-create-test-run.html

Edit: I can actualy access /storage/emulated/0, but the files from the data.zip I uploaded are not there.

louwers commented 11 months ago

Super close now...

During a manual run on AWS Device Farm I was able to read from /storage/emulated/0, but now that the benchmark is running on CI I am hit with.

java.io.FileNotFoundException: /storage/emulated/0/benchmark-input.json: open failed: EACCES (Permission denied)

louwers commented 11 months ago

I might need to use a custom test environment and use this: https://developer.android.com/training/data-storage/manage-all-files#enable-manage-external-storage-for-testing

louwers commented 11 months ago

Tests with custom test environment fail with !!! FAILED BINDER TRANSACTION !!!

Details

Device 00:01.983 1302 Error JavaBinder !!! FAILED BINDER TRANSACTION !!! (parcel size = 3956) Device 00:01.983 1302 Error WifiMetrics Unable to invoke Wifi usability stats entry listener Device 00:01.983 1302 Error WifiMetrics android.os.DeadObjectException: Transaction failed on small parcel; remote process probably died, but this could also be caused by running out of binder buffe Device 00:01.983 1302 Error WifiMetrics at android.os.BinderProxy.transactNative(Native Method) Device 00:01.983 1302 Error WifiMetrics at android.os.BinderProxy.transact(BinderProxy.java:584) Device 00:01.983 1302 Error WifiMetrics at android.net.wifi.IOnWifiUsabilityStatsListener$Stub$Proxy.onWifiUsabilityStats(IOnWifiUsabilityStatsListener.java:149) Device 00:01.983 1302 Error WifiMetrics at com.android.server.wifi.WifiMetrics.sendWifiUsabilityStats(WifiMetrics.java:6995) Device 00:01.983 1302 Error WifiMetrics at com.android.server.wifi.WifiMetrics.updateWifiUsabilityStatsEntries(WifiMetrics.java:6971) Device 00:01.983 1302 Error WifiMetrics at com.android.server.wifi.ClientModeImpl$L2ConnectedState.updateLinkLayerStatsRssiDataStallScoreReport(ClientModeImpl.java:6677) Device 00:01.983 1302 Error WifiMetrics at com.android.server.wifi.ClientModeImpl$L2ConnectedState.processMessageImpl(ClientModeImpl.java:6466) Device 00:01.983 1302 Error WifiMetrics at com.android.server.wifi.RunnerState.processMessage(RunnerState.java:62) Device 00:01.983 1302 Error WifiMetrics at com.android.wifi.x.com.android.internal.util.StateMachine$SmHandler.processMsg(StateMachine.java:1001) Device 00:01.983 1302 Error WifiMetrics at com.android.wifi.x.com.android.internal.util.StateMachine$SmHandler.handleMessage(StateMachine.java:819) Device 00:01.983 1302 Error WifiMetrics at android.os.Handler.dispatchMessage(Handler.java:106) Device 00:01.983 1302 Error WifiMetrics at android.os.Looper.loopOnce(Looper.java:201) Device 00:01.983 1302 Error WifiMetrics at android.os.Looper.loop(Looper.java:288) Device 00:01.983 1302 Error WifiMetrics at android.os.HandlerThread.run(HandlerThread.java:67) Harness 00:02.123 6472 Failed - Tests failed

louwers commented 11 months ago

adb shell appops set --uid ... MANAGE_EXTERNAL_STORAGE allow

only works for debug builds. 😮‍💨

I don't know how I can get access to the external data needed for the benchmark.

Tried this hack. Works locally but not on AWS Device Farm. https://stackoverflow.com/questions/70366670/android-os-11-pushing-data-files-to-an-applications-scoped-storage

louwers commented 2 weeks ago

We now do continuous benchmarking. Benchmark happens automatically on main merge. cc @hy9be @sjg-wdw

Results are collected on the public S3 bucket under the git hash of the build. E.g.

https://maplibre-native.s3.eu-central-1.amazonaws.com/index.html#android-benchmark-render/b4b779dd29d1c622d8461c66bac4ecac9cd2dff5/

I still need to write a script to collect and plot results over time.

We probably want to use different devices instead of just the Pixel 7. But since the benchmark currently takes 1.5h to run, that means we will clog up the 3 device slots that we have and PRs may time out. We can reduce the time by using either sync/async rendering or reduce the number of iterations or using fewer styles (we currently use 3).

maplibre / maplibre-native

Continuous (Rendering) Benchmarking #1886

HTTP API

Workflow