openhab / openhab-addons

Add-ons for openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.86k stars 3.58k forks source link

[Freebox] Possible memory leak #3767

Closed Orfait closed 6 years ago

Orfait commented 6 years ago

I am experiencing a probable memory leak in openhab, RAM is at almost 100% (2GB) in few hours and begins to use swap (500MB).

After some tests, I discovered that I can "fix" this memory issue by disabling the Freebox binding.

My setup is :

Ask me for more info if needed.

Orfait commented 6 years ago

I don't remember changing anything. I agree, that's weird...

I did a heap dump right now then disabled the freebox binding.

I am thinking about creating a new openhab instance from scratch...

martinvw commented 6 years ago

@orfait feel free to send me the heapdump maybe via WeTransfer?

Orfait commented 6 years ago

@martinvw : I sent you a link to download the heap dump via your personal website.

martinvw commented 6 years ago

At what moment did you create the heap dump? Was openHAB already shutting down or the freebox binding killed? It claims that only 45 MB was still in used the rest could have been garbage collected...

It would be great if you could create a heapdump before its shutting down / error-ing because now the standard tools offer barely any help.

What I can see is 600 MB of char arrays, 130 MB of byte arrays, 46 MB of byte arrays, so it seems very much like a memory leak.

Looking at the openHAB freebox code I don't expect it in the binding, if its caused by the binding then by the external client it uses.

If you have the option to create another heapdump, please do so. If you create it after 4 hours (its not strictly required that its nearly on its knees but if of course helps) it should be fine as well.

Orfait commented 6 years ago

Yes, I will do it today. From what I remember, OpenHAB was still running with Freebox binding enabled.

Orfait commented 6 years ago

I am not sure, but I can check : is there anything to take care inside rules concerning memory ? If I instantiate some variables, should I release hem ? How ? For me, GC should do it...

(I work in embedded electronics, so I use mainly C/C++, not javascript/java)

martinvw commented 6 years ago

No GC should indeed take care of it, as long as you don't keep references to for example a list of previous results. So if a (group of) object(s) has no reference anymore it can be removed by the garbage collection, but if there is at least one path remaining which addresses this object it will remain alive.

martinvw commented 6 years ago

@Orfait sorry, this is a strange how do you create the heapdump?

screen shot 2018-07-30 at 15 25 55

Still 45 MB of heap size with a file of 400MB + and all data can be found at the unreachable objects tab. What JDK are you now using? And as mentioned above can you enlighten me about the command you used to create the heapdump? Thanks!

martinvw commented 6 years ago

@Orfait I just found how I can disable Eclipse MAT's garbage collection behaviour, now I have the option to do more. Its related to the json conversions.

martinvw commented 6 years ago

I just created a dump using:

https://stackoverflow.com/questions/407612/how-to-get-a-thread-and-heap-dump-of-a-java-process-on-windows-thats-not-runnin

And I can import it into eclipse MAT tool (and it has much more troubles importing it)

Orfait commented 6 years ago

I used the same command.

I just created a new LXC container with full fresh install JRE8/Openhab and only freebox binding. Let's see how it is going !

lolodomo commented 6 years ago
cat /proc/meminfo | grep '^MemAvailable:' | awk '{print $2/1024}'
488.742
top - 07:10:36 up 14 days,  6:26,  1 user,  load average: 0,56, 0,38, 0,26
Tasks: 107 total,   1 running, 106 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1,1 us,  0,7 sy,  0,0 ni, 98,2 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
KiB Mem:    996452 total,   955264 used,    41188 free,    41132 buffers
KiB Swap:   102396 total,     9032 used,    93364 free.   347068 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
23052 xx        20   0  504952 308160  14936 S   5,9 30,9 328:15.30 java
free -m
             total       used       free     shared    buffers     cached
Mem:           973        932         40          2         40        338
-/+ buffers/cache:        553        419
Swap:           99          8         91
VmPeak:   505984 kB
VmSize:   504952 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    308464 kB
VmRSS:    308160 kB
VmData:   485856 kB
VmStk:       136 kB
VmExe:         4 kB
VmLib:      9980 kB
VmPTE:       412 kB
VmPMD:         0 kB
VmSwap:        0 kB
Threads:        179

It looks like the memory used by the java process has now reduced, probably after garbage collector. Globally, my RPI available memory has decreased a little but it is not due to the java process. I am still thinking there is no memory leak ;)

Orfait commented 6 years ago

The new LXC container does not show any abnormal memory usage (little increase). I will continue the migration of other binding slowly.

cat /proc/meminfo | grep '^MemAvailable:' | awk '{print $2/1024}'
1546.48
top - 06:05:21 up  9:58,  1 user,  load average: 0.06, 0.14, 0.33
Tasks:  16 total,   1 running,  15 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.0 us,  1.0 sy,  0.0 ni, 98.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2097152 total,  1405780 free,   514004 used,   177368 buff/cache
KiB Swap:  4194304 total,  4194304 free,        0 used.  1583148 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  112 openhab   20   0 5821636 407060  19980 S   0.0 19.4  10:06.65 java
free -m
              total        used        free      shared  buff/cache   available
Mem:           2048         502        1372          86         173        1545
Swap:          4096           0        4096
cat /proc/112/status
Name:   java
Umask:  0022
State:  S (sleeping)
Tgid:   112
Ngid:   0
Pid:    112
PPid:   1
TracerPid:      0
Uid:    108     108     108     108
Gid:    114     114     114     114
FDSize: 512
Groups: 114
NStgid: 112
NSpid:  112
NSpgid: 112
NSsid:  112
VmPeak:  5829656 kB
VmSize:  5821636 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    407988 kB
VmRSS:    407060 kB
RssAnon:          387080 kB
RssFile:           19980 kB
RssShmem:              0 kB
VmData:   626600 kB
VmStk:       132 kB
VmExe:         4 kB
VmLib:     17976 kB
VmPTE:      1484 kB
VmSwap:        0 kB
HugetlbPages:          0 kB
CoreDumping:    0
Threads:        147
SigQ:   0/31004
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 2000000181005ccf
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003cfdfcffff
CapAmb: 0000000000000000
NoNewPrivs:     0
Seccomp:        2
Speculation_Store_Bypass:       vulnerable
Cpus_allowed:   c
Cpus_allowed_list:      2-3
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:      0
voluntary_ctxt_switches:        57
nonvoluntary_ctxt_switches:     14
lolodomo commented 6 years ago

Please remame the title of the issue ;)

You can already install these bindings as they do not generate memory leaks : astro ntp netatmo freebox rfxcom hue sonos weatherunderground mios zwave (dev).

Orfait commented 6 years ago

So, I'm back to initial problem.

Since months, the memory usage is growing slowly. I opened this issue because the "growing rate" has terribly increased. After reinstalling everything on a new LXC container, I am back to the original growing rate".

After analyzing previous data, it seems that the data from "top" is sufficient. If you substract the memory used from the total, you get the memory available returned with /proc/meminfo.

Here is some data from actual install. I don't know how to interpret it. Nothing consumes memory according to the global memory usage increase. So, I will let run for more days/weeks to see if it stills consume all RAM and eat some swap.

top - 08:49:40 up 1 day, 22 min,  1 user,  load average: 0.09, 0.31, 0.37
Tasks:  16 total,   1 running,  15 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.7 us,  2.2 sy,  0.0 ni, 97.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2097152 total,  1185024 free,   681504 used,   230624 buff/cache
KiB Swap:  4194304 total,  4194304 free,        0 used.  1415648 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  112 openhab   20   0 6007004 459116  19884 S   1.7 21.9  58:08.80 java
  110 nodered   30  10 1240404  98748  22912 S   0.0  4.7   0:23.82 node-red
Tasks:  16 total,   1 running,  15 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.8 us,  1.2 sy,  0.0 ni, 98.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2097152 total,  1157776 free,   696488 used,   242888 buff/cache
KiB Swap:  4194304 total,  4194304 free,        0 used.  1400664 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  112 openhab   20   0 6007004 459740  19884 S   2.7 21.9  62:25.65 java
23109 nodered   30  10 1174664 103704  23224 S   0.0  4.9   0:03.04 node-red
top - 16:17:45 up 1 day,  5:50,  1 user,  load average: 0.18, 0.15, 0.31
Tasks:  18 total,   1 running,  17 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.7 us,  0.7 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2097152 total,  1026052 free,   720568 used,   350532 buff/cache
KiB Swap:  4194304 total,  4194304 free,        0 used.  1376584 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  112 openhab   20   0 6007004 460744  19884 S   1.7 22.0  71:13.14 java
23109 nodered   30  10 1174664 104000  23224 S   0.0  5.0   0:06.45 node-red
top - 21:07:01 up 1 day, 10:40,  1 user,  load average: 6.87, 1.85, 0.78
Tasks:  18 total,   1 running,  17 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.0 us,  2.3 sy,  0.0 ni, 96.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2097152 total,  1001484 free,   744652 used,   351016 buff/cache
KiB Swap:  4194304 total,  4194304 free,        0 used.  1352500 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  112 openhab   20   0 6007004 461996  19884 S   1.7 22.0  82:41.31 java
23109 nodered   30  10 1175284 102948  23224 S   0.0  4.9   0:10.97 node-red
top - 08:24:25 up 1 day, 21:57,  1 user,  load average: 0.09, 0.10, 0.16
Tasks:  18 total,   1 running,  17 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.0 us,  1.0 sy,  0.0 ni, 98.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2097152 total,   942420 free,   802424 used,   352308 buff/cache
KiB Swap:  4194304 total,  4194304 free,        0 used.  1294728 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  112 openhab   20   0 6007004 466536  19884 S   1.7 22.2 109:28.25 java
23109 nodered   30  10 1175284 101020  23224 S   0.0  4.8   0:21.19 node-red
top - 13:00:08 up 3 days,  2:33,  1 user,  load average: 0.04, 0.05, 0.16
Tasks:  18 total,   1 running,  17 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.5 us,  0.5 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2097152 total,   794348 free,   947364 used,   355440 buff/cache
KiB Swap:  4194304 total,  4194304 free,        0 used.  1149788 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  112 openhab   20   0 6007008 469844  19888 S   2.0 22.4 177:24.67 java
23109 nodered   30  10 1175796 101064  23224 S   0.0  4.8   0:46.99 node-red
top - 12:07:49 up 4 days,  1:40,  1 user,  load average: 2.90, 1.84, 0.95
Tasks:  18 total,   1 running,  17 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  1.7 sy,  0.0 ni, 98.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2097152 total,   671412 free,  1067548 used,   358192 buff/cache
KiB Swap:  4194304 total,  4194304 free,        0 used.  1029604 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  112 openhab   20   0 6007008 470844  19888 S   2.0 22.5 232:24.65 java
23109 nodered   30  10 1175796 101352  23224 S   0.0  4.8   1:07.17 node-red
lolodomo commented 6 years ago

Maybe a specific problem with your container or JRE ?

I will provide my stats after few days but my first results show no problem on a RPI 2 with raspbian and an Oracle JRE.

lolodomo commented 6 years ago

6 days later.

cat /proc/meminfo | grep '^MemAvailable:' | awk '{print $2/1024}'
403.785
 free -m
             total       used       free     shared    buffers     cached
Mem:           973        915         57          2         39        185
-/+ buffers/cache:        690        282
Swap:           99          8         91
top - 09:47:19 up 18 days,  9:03,  1 user,  load average: 0,28, 0,25, 0,20
Tasks: 107 total,   1 running, 106 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,3 us,  0,3 sy,  0,0 ni, 99,1 id,  0,1 wa,  0,0 hi,  0,1 si,  0,0 st
KiB Mem:    996452 total,   937128 used,    59324 free,    40188 buffers
KiB Swap:   102396 total,     8776 used,    93620 free.   189960 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
23052 xx        20   0  529700 343456  14952 S   2,0 34,5   1474:51 java
VmPeak:   531748 kB
VmSize:   529700 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    344908 kB
VmRSS:    343372 kB
VmData:   510432 kB
VmStk:       136 kB
VmExe:         4 kB
VmLib:      9980 kB
VmPTE:       462 kB
VmPMD:         0 kB
VmSwap:        0 kB
Threads:        192

This time, the java process seems to require more RAM than 6 days ago.

Globally my system (RPI2) has lost 87 MB of available memory in 6 days. If this tendance is correct, it could be a problem for my system in 27 days ! But clearly openHAB (java process) is only responsible for 20% of the lost RAM. I am curious to understand what is eating the remaining.

I will continue monitoring the tendance in the next days.

lolodomo commented 6 years ago

Looking at the warnings in Eclipse, I discovered that Eclipse identifies a ressource leak at this line (webhook servlet): https://github.com/openhab/openhab2-addons/blob/master/addons/binding/org.openhab.binding.netatmo/src/main/java/org/openhab/binding/netatmo/internal/webhook/WelcomeWebHookServlet.java#L88

Orfait commented 6 years ago

I also use Netatmo binding. I can disable it for testing.

lolodomo commented 6 years ago

Are you using the webhook ?

Orfait commented 6 years ago

I don't know what you mean, but I use the Netatmo binding pointed by your link.

lolodomo commented 6 years ago

The webhook is a special feature for users having a Welcome Camera. If you don't know it, that means you are not concerned by the potential memory leak. By the way, I have now fixed it.

Orfait commented 6 years ago

After some days running, I rebooted the container and I am back to initial situation...

top - 08:53:59 up 19:19,  1 user,  load average: 0.64, 0.65, 0.67
Tasks:  18 total,   1 running,  17 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.7 us,  0.7 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2097152 total,    20672 free,  2050360 used,    26120 buff/cache
KiB Swap:  4194304 total,  3928720 free,   265584 used.    46792 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1305 openhab   20   0 6072432 1.887g   5816 S   2.3 94.3  43:43.46 java
  108 nodered   30  10 1239848  38708   3964 S   0.0  1.8   0:18.93 node-red

I have another container with similar config (for influxdb and grafana) without this issue. Any help would be greatly appreciated...

martinvw commented 6 years ago

Did you already try setting a max heap size? Because java is one of those things which will only free memory if needed. Maybe it won’t grow / will cleanup better if there is max heap size configured?

Orfait commented 6 years ago

@martinvw : for info, actual settings for java :

VM Arguments:
jvm_args: -Dopenhab.home=/usr/share/openhab2 -Dopenhab.conf=/etc/openhab2 -Dopenhab.runtime=/usr/share/openhab2/runtime -Dopenhab.userdata=/var/lib/openhab2 -Dopenhab.logdir=/var/log/openhab2 -Dfelix.cm.dir=/var/lib/openhab2/config -Djetty.host=0.0.0.0 -Djetty.http.compliance=RFC2616 -Dorg.ops4j.pax.web.listening.addresses=0.0.0.0 -Dorg.osgi.service.http.port=8080 -Dorg.osgi.service.http.port.secure=8443 -Djava.awt.headless=true -XX:+UseG1GC -Djava.endorsed.dirs=/usr/lib/jvm/java-8-oracle/jre/jre/lib/endorsed:/usr/lib/jvm/java-8-oracle/jre/lib/endorsed:/usr/share/openhab2/runtime/lib/endorsed -Djava.ext.dirs=/usr/lib/jvm/java-8-oracle/jre/jre/lib/ext:/usr/lib/jvm/java-8-oracle/jre/lib/ext:/usr/share/openhab2/runtime/lib/ext -Dkaraf.instances=/var/lib/openhab2/tmp/instances -Dkaraf.home=/usr/share/openhab2/runtime -Dkaraf.base=/var/lib/openhab2 -Dkaraf.data=/var/lib/openhab2 -Dkaraf.etc=/var/lib/openhab2/etc -Dkaraf.logs=/var/log/openhab2 -Dkaraf.restart.jvm.supported=true -Djava.io.tmpdir=/var/lib/openhab2/tmp -Djava.util.logging.config.file=/var/lib/openhab2/etc/java.util.logging.properties -Dkaraf.startLocalConsole=false -Dkaraf.startRemoteShell=true
java_command: org.apache.karaf.main.Main
java_class_path (initial): /usr/share/openhab2/runtime/lib/boot/org.apache.karaf.diagnostic.boot-4.1.5.jar:/usr/share/openhab2/runtime/lib/boot/org.apache.karaf.jaas.boot-4.1.5.jar:/usr/share/openhab2/runtime/lib/boot/org.apache.karaf.main-4.1.5.jar:/usr/share/openhab2/runtime/lib/boot/org.osgi.core-6.0.0.jar
Launcher Type: SUN_STANDARD
-XX:CICompilerCount=2 -XX:ConcGCThreads=1 -XX:G1HeapRegionSize=1048576 -XX:InitialHeapSize=130023424 -XX:MarkStackSize=4194304 -XX:MaxHeapSize=2074083328 -XX:MaxNewSize=1243611136 -XX:MinHeapDeltaBytes=1048576 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseFastUnorderedTimeStamps -XX:+UseG1GC

So, I am trying -Xmx=1024m

EDIT : I noticed that the "initial" RAM usage decrease after each restart of openhab, then it stabilize after 3 times (eg. 800m, 600m, 500m, 400m, 400m, 400m...)

martinvw commented 6 years ago

Note that the -XX:MaxHeapSize=2074083328 does not match the -Xmx=1024m which you state, the params XX:MaxHeapSize & Xmx are equivalent.

Orfait commented 6 years ago

@martinvw : yes, I was only checking the settings before modifying to 1024m. First results are not good, memory usage is still growing very fast. I will wait until tomorrow to confirm it.

My brother also use openhab under proxmox, with similar bindings. What I have seen immediately is related to disk usage. His container is reporting very few disk access, but mine is showing high disk usage (constant 2,5Mo/s). What can I do to check this ?

Orfait commented 6 years ago

So, it is working with the Xmx parameter. In fact, java seems to set it to the total RAM amount by default : not a good idea.

Indeed, I am still watching for this disk usage. I managed to find out using inotify : inotifywait -mr --exclude '/dev/pts' -e modify,attrib,close_write,move,create,delete / Seems to be related to Discovery result.

@lolodomo : As freebox binding is reporting all network devices discovered (even devices from several years ago... 160+ fore me, thank you free...), could you add a parameter to disable auto discovery ?

lolodomo commented 6 years ago

22 days later, I can see that the memory used by the java process has clearly increased.

Tasks: 107 total,   1 running, 106 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1,4 us,  0,8 sy,  0,0 ni, 97,9 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
KiB Mem:    996452 total,   966532 used,    29920 free,    31116 buffers
KiB Swap:   102396 total,    10364 used,    92032 free.    59608 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
23052 xxx        20   0  627140 447108   5196 S   6,6 44,9   6054:07 java

%MEM is now 44.9. During this period, I changed one binding but without stopping openHAB. I have no idea if there is really a problkem in openHAB or if this is simply the way java is managing the memory.

lolodomo commented 6 years ago

@Orfait : regading auto discovery, first we have to ask if it is possible to disable the auto discoveryu of a binding. Maybe it is already possible. If not, yes I could add a setting.

lolodomo commented 6 years ago

@Orfait : done https://github.com/openhab/openhab2-addons/pull/3864

Orfait commented 6 years ago

@lolodomo and @martinvw : I changed my ISP today. So bye bye Freebox. But : since the exact instant I changed, the high disk usage dropped and memory stopped to grow. I also noticed a serious decrease in network traffic (even if it was low) and same with CPU usage.

I don't know if it was related to Freebox binding or something special in my setup, but clearly, Freebox binding had a special role in my issue.

I will wait tomorrow to post a screenshot of what I have seen.

martinvw commented 6 years ago

@Orfait also a good way to resolve the problem 😉

Maybe something related to the number of devices remembered by your modem or something like that which triggers the problems.

Orfait commented 6 years ago

capture d ecran 2018-09-01 a 10 06 56

You can see by yourself what I observed. The memory dropped when I rebooted, that's why there is a "delay" for it.

The question is : may I close the issue ? I think yes. If you agree, I re-add the reference to Freebox binding before closing, if someone else have the same problem...