openhab / openhab-addons

Add-ons for openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.86k stars 3.56k forks source link

Memory leaking in snmp requests. #8622

Closed fab-kis closed 3 years ago

fab-kis commented 3 years ago

[snmp] Async requests in the snmp4j are filling up memory, leading to a java vm crash for the openhab process.

Expected Behavior

openhab process does not fill up memory during runtime.

Current Behavior

After some days (5-7) the openHAB service crashes with a OutOfMemoryException. The memory analyzer tool (MAT) in eclipse shows the following output of a dump from the openHAB process:

One instance of "org.snmp4j.Snmp" loaded by "org.openhab.binding.snmp" occupies 274.277.360 (85,24 %) bytes.
The memory is accumulated in one instance of "java.util.Hashtable$Entry[]" loaded by "<system class loader>".

Keywords
java.util.Hashtable$Entry[]
org.openhab.binding.snmp
org.snmp4j.Snmp

Possible Solution

I think the same issue was reported in the openhab1 version of the SNMP addon, here in issue #5202. The API doc of snmp4j states that an async request must be canceled:

public void cancel​(PDU request,
                   ResponseListener listener)

Description copied from interface: Session
Cancels an asynchronous request. Any asynchronous request must be canceled when the supplied response
listener is being called, even if the ResponseEvent indicates an error.

A proposed diff would be

index ce1777a35..9ecc38f8f 100644
--- a/bundles/org.openhab.binding.snmp/src/main/java/org/openhab/binding/snmp/internal/SnmpTargetHandler.java
+++ b/bundles/org.openhab.binding.snmp/src/main/java/org/openhab/binding/snmp/internal/SnmpTargetHandler.java
@@ -175,6 +175,15 @@ public class SnmpTargetHandler extends BaseThingHandler implements ResponseListe
         if (event == null) {
             return;
         }
+
+        if (event.getSource() instanceof org.snmp4j.Snmp) {
+            // Always cancel async request when response has been received
+            // otherwise a memory leak is created! Not canceling a request
+            // immediately can be useful when sending a request to a broadcast
+            // address.
+            ((org.snmp4j.Snmp) event.getSource()).cancel(event.getRequest(), this);
+        }
+
         PDU response = event.getResponse();
         if (response == null) {
             Exception e = event.getError();

Steps to Reproduce (for Bugs)

Steps to reproduce (taken from here

Context

We do not want to restart the service after several days again.

Your Environment

Raspi, openHABian, version 2.5.9:

                          __  _____    ____
  ____  ____  ___  ____  / / / /   |  / __ )
 / __ \/ __ \/ _ \/ __ \/ /_/ / /| | / __  |
/ /_/ / /_/ /  __/ / / / __  / ___ |/ /_/ /
\____/ .___/\___/_/ /_/_/ /_/_/  |_/_____/
    /_/                        2.5.9
                               Release Build

Hit '<tab>' for a list of available commands
and '[cmd] --help' for help on a specific command.
Hit '<ctrl-d>' or type 'system:shutdown' or 'logout' to shutdown openHAB.

openhab> info
Karaf
  Karaf version               4.2.7
  Karaf home                  /usr/share/openhab2/runtime
  Karaf base                  /var/lib/openhab2
  OSGi Framework              org.eclipse.osgi-3.12.100.v20180210-1608

JVM
  Java Virtual Machine        OpenJDK Client VM version 25.265-b11
  Version                     1.8.0_265
  Vendor                      Azul Systems, Inc.
  Pid                         28308
  Uptime                      3 days
  Process CPU time            3 hours 1 minute
  Process CPU load            0.01
  System CPU load             0.01
  Open file descriptors       217
  Max file descriptors        102,642
  Total compile time          16 minutes
Threads
  Live threads                148
  Daemon threads              81
  Peak                        154
  Total started               60056
Memory
  Current heap size           109,101 kbytes
  Maximum heap size           316,800 kbytes
  Committed heap size         190,400 kbytes
  Pending objects             0
  Garbage collector           Name = 'Copy', Collections = 2901, Time = 55.879 seconds
  Garbage collector           Name = 'MarkSweepCompact', Collections = 5, Time = 1.605 seconds
Classes
  Current classes loaded      17,967
  Total classes loaded        25,387
  Total classes unloaded      7,420
Operating system
  Name                        Linux version 5.4.51-v7l+
  Architecture                arm
  Processors                  4
fab-kis commented 3 years ago

I would like to link this issue with the pull request #8623, but honestly i do not know how. This pr should fix this issue.