open-telemetry / opentelemetry-java-instrumentation

OpenTelemetry auto-instrumentation and instrumentation libraries for Java
https://opentelemetry.io
Apache License 2.0
1.95k stars 855 forks source link

Deadlock detection #7885

Open jack-berg opened 1 year ago

jack-berg commented 1 year ago

I'm curious if there's an appetite for adding JMX based deadlock detection. I think deadlocks are good candidates to emit as events because:

Here's a quick and dirty prototype that detects deadlocks and emits as events:

    Thread deadlockDetector =
        new Thread(
            () -> {
              EventEmitter eventEmitter =
                  GlobalEventEmitterProvider.get()
                      .eventEmitterBuilder("deadlock-finder")
                      .setEventDomain("jmx")
                      .build();
              ThreadMXBean threadMXBean = ManagementFactory.getThreadMXBean();
              Set<Long> threadIds = new HashSet<>();
              while (!Thread.currentThread().isInterrupted()) {
                long[] deadlockedThreads = threadMXBean.findDeadlockedThreads();
                if (deadlockedThreads != null) {
                  for (long threadId : deadlockedThreads) {
                    if (threadIds.add(threadId)) {
                      ThreadInfo threadInfo = threadMXBean.getThreadInfo(threadId, 10);
                      eventEmitter.emit(
                          "deadlockDetected",
                          Attributes.builder()
                              .put("thread.name", threadInfo.getThreadName())
                              .put("thread.lockOwnerId", threadInfo.getLockOwnerId())
                              .put("thread.lockOwnerName", threadInfo.getLockOwnerName())
                              .put(
                                  "thread.stacktrace",
                                  Stream.of(threadInfo.getStackTrace())
                                      .map(StackTraceElement::toString).toArray(String[]::new))
                              .build());
                    }
                }
                }
                try {
                  Thread.sleep(1000);
                } catch (InterruptedException e) {
                  Thread.currentThread().interrupt();
                }
              }
            });
    deadlockDetector.setDaemon(true);
    deadlockDetector.start();

If I cause a deadlock on an app configured with the OtlpJsonLoggingLogRecordExporter, I get data shaped like:

{
    "resource": { ... },
    "scopeLogs": [{
        "scope": {
            "name": "deadlock-finder",
            "attributes": []
        },
        "logRecords": [{
            "timeUnixNano": "1677112206545795000",
            "body": { ... },
            "attributes": [{
                "key": "event.domain",
                "value": {
                    "stringValue": "jmx"
                }
            }, {
                "key": "event.name",
                "value": {
                    "stringValue": "deadlockDetected"
                }
            }, {
                "key": "thread.lockOwnerId",
                "value": {
                    "intValue": "42"
                }
            }, {
                "key": "thread.lockOwnerName",
                "value": {
                    "stringValue": "Thread-3"
                }
            }, {
                "key": "thread.name",
                "value": {
                    "stringValue": "Thread-2"
                }
            }, {
                "key": "thread.stacktrace",
                "value": {
                    "arrayValue": {
                        "values": [{
                            "stringValue": "app//com.newrelic.app.Controller.lambda$deadlock$0(Controller.java:60)"
                        }, {
                            "stringValue": "app//com.newrelic.app.Controller$$Lambda$843/0x00000008005c6d38.run(Unknown Source)"
                        }, {
                            "stringValue": "app//com.newrelic.app.Controller.doWork(Controller.java:89)"
                        }, {
                            "stringValue": "app//com.newrelic.app.Controller.lambda$deadlock$1(Controller.java:52)"
                        }, {
                            "stringValue": "app//com.newrelic.app.Controller$$Lambda$841/0x00000008005c68e8.run(Unknown Source)"
                        }, {
                            "stringValue": "java.base@17.0.3/java.lang.Thread.run(Thread.java:833)"
                        }]
                    }
                }
            }]
        }, {
            "timeUnixNano": "1677112206546658000",
            "body": {
                "stringValue": ""
            },
            "attributes": [{
                "key": "event.domain",
                "value": {
                    "stringValue": "jmx"
                }
            }, {
                "key": "event.name",
                "value": {
                    "stringValue": "deadlockDetected"
                }
            }, {
                "key": "thread.lockOwnerId",
                "value": {
                    "intValue": "41"
                }
            }, {
                "key": "thread.lockOwnerName",
                "value": {
                    "stringValue": "Thread-2"
                }
            }, {
                "key": "thread.name",
                "value": {
                    "stringValue": "Thread-3"
                }
            }, {
                "key": "thread.stacktrace",
                "value": {
                    "arrayValue": {
                        "values": [{
                            "stringValue": "app//com.newrelic.app.Controller.lambda$deadlock$2(Controller.java:75)"
                        }, {
                            "stringValue": "app//com.newrelic.app.Controller$$Lambda$844/0x00000008005c6f60.run(Unknown Source)"
                        }, {
                            "stringValue": "app//com.newrelic.app.Controller.doWork(Controller.java:89)"
                        }, {
                            "stringValue": "app//com.newrelic.app.Controller.lambda$deadlock$3(Controller.java:67)"
                        }, {
                            "stringValue": "app//com.newrelic.app.Controller$$Lambda$842/0x00000008005c6b10.run(Unknown Source)"
                        }, {
                            "stringValue": "java.base@17.0.3/java.lang.Thread.run(Thread.java:833)"
                        }]
                    }
                }
            }]
        }]
    }]
}

It would be nice to be able to detect and alert on deadlock events.

brunobat commented 1 year ago

Interesting. I imagine these would be sent as log events and would be opt-in, right? Would the detection be always on or could it be triggered by something else? I wonder how frequent these events are...

mateuszrzeszutek commented 1 year ago

Good thing we're just renaming runtime-metrics to runtime-telemetry 😄

I think this could be a nice addition; could be disabled by default - we're not exporting logs by default anyway (at least for now).

jack-berg commented 1 year ago

agine these would be sent as log events and would be opt-in, right?

Minimally opt in for a while. Could imagine later turning on by default if they get added to semantic conventions.

Would the detection be always on or could it be triggered by something else?

I figure it could just run periodically with a interval long enough that there isn't concern about execution time.

I wonder how frequent these events are...

I hope infrequent! 😁