open-telemetry / opentelemetry-android

OpenTelemetry Tooling for Android
Apache License 2.0
156 stars 40 forks source link

Instrumentation for StrictMode #520

Open MrHadiSatrio opened 3 months ago

MrHadiSatrio commented 3 months ago

Hello. Are there any plans for StrictMode instrumentation? ThreadPolicy#penaltyListener() and VmPolicy#penaltyListener() which were added in API 28 may provide a solid foundation for one.

LikeTheSalad commented 2 months ago

That sounds interesting, though it seems like StrictMode is more targeted as a development tool, so I'm wondering if there's a non-trivial performance overhead caused by enabling it?

MrHadiSatrio commented 2 months ago

I've done a small analysis on the performance on a Pixel 3a XL, Android 12. At least from here it does look like there is an overhead.

Run StrictMode (ms) Baseline (ms) Delta (ms) Delta (%)
1 131.5 125.7 5.8 4.61%
2 144.5 133.8 10.7 8.00%
3 136.5 116.3 20.2 17.37%
4 124.1 150.8 -26.7 -17.71%
5 131.9 132.6 -0.7 -0.53%
6 132.2 118.5 13.7 11.56%
7 128.7 110.3 18.4 16.68%
8 138.1 127.1 11 8.65%
9 132.4 122.4 10 8.17%
10 131.4 121.8 9.6 7.88%
Mean 133.13 125.93 7.2 5.72%
Median 132.05 124.05 8 6.45%

As for the code...

    val uid = UUID.randomUUID()
    val times = 10
    val enableSm = true
    val executor = ContextCompat.getMainExecutor(applicationContext)

    if (enableSm) {
        StrictMode.setThreadPolicy(
            StrictMode.ThreadPolicy.Builder()
                .detectAll()
                .penaltyListener(executor) {}
                .build()
        )
        StrictMode.setVmPolicy(
            StrictMode.VmPolicy.Builder()
                .detectAll()
                .penaltyListener(executor) {}
                .build()
        )
    }

    Debug.startMethodTracing(
        getExternalFilesDir(null)!!.absolutePath + "/${times}_strictmode_${enableSm}_$uid.trace"
    )

    repeat(times) {
        getSharedPreferences(it.toString(), MODE_PRIVATE)
            .edit()
            .apply { repeat(8) { putString(it.toString(), it.toString()) } }
            .commit() // ...triggers DiskReadViolation
    }

    Debug.stopMethodTracing()
LikeTheSalad commented 2 months ago

Thank you for taking the time to collect and provide this info 🙏 it helps a lot. Based on what you found I'd say that we shouldn't provide this kind of instrumentation by default to avoid affecting apps' performance unless it's really needed. It's still probably worth creating some opt-in instrumentation for it, though I'd like to get some people to chime in just to get an idea of what's the appetite for it.

MrHadiSatrio commented 2 months ago

I agree with you. One thing I want to also highlight is that the above example essentially only triggers one type of violation. So in an actual application, there could be an even bigger overhead.

That said, in my honest opinion, there is still value to be gained, especially (or perhaps exclusively) in staging environments. penaltyDeath() is not always an option, so the logs may enable teams to stay on top of these violations and take action.

Curious to hear what others think as well! 🙌🏽

breedx-splk commented 2 months ago

@MrHadiSatrio this is interesting for sure. Is the expected usage for application developers to have additional ways of leveraging StrictMode through this otel agent, versus wiring it in themselves? And then the idea is to export those violations as logs when they occur?

I think it's perfectly acceptable for us to have instrumentation in this repo that isn't necessarily enabled by default or included in the topmost agent. Seems like a worthy experiment.

Is it also possible to turn these strict mode configurations back off?

MrHadiSatrio commented 2 months ago

@breedx-splk, yes. StrictMode violations are reported to the listeners as Throwables, so perhaps we can apply a similar treatment to what we're doing for crashes.

Is it also possible to turn these StrictMode configurations back off?

Yes, it is. StrictMode policies can be changed at any time. One could assign either ThreadPolicy.LAX or VmPolicy.LAX (or indeed both), which would internally disable violation handling.

Come to think of it, the above does mean that users would be able to accidentally turn the instrumentation off without the library being aware of it. :/

MrHadiSatrio commented 2 months ago

So I took a quick stab at the idea and did a spike implementation on this branch. I couldn't seem to surface the logs on Jaeger, so I went with grafana/otel-lgtm instead.

Here's what a StrictMode dashboard may look like:

Screenshot 2024-08-15 at 14 11 59
bidetofevil commented 2 months ago

I've found that strict mode violation tend to be the result of structural issues with the code, so deploying it on a beta or dogfood build is usually sufficient when trying to ferret out said issues. If we were add this to the project, I think making this opt-in would be desirable so that it can be enabled for certain flavors or builds, but not in general production.