openhab / openhab-addons

Add-ons for openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.86k stars 3.56k forks source link

[Freebox] Possible memory leak #3767

Closed Orfait closed 5 years ago

Orfait commented 6 years ago

I am experiencing a probable memory leak in openhab, RAM is at almost 100% (2GB) in few hours and begins to use swap (500MB).

After some tests, I discovered that I can "fix" this memory issue by disabling the Freebox binding.

My setup is :

Ask me for more info if needed.

lolodomo commented 6 years ago

Weird. I am using this binding. The problem could be inside the binding or in the library we rely on. At my knowledge nothing was changed since a long time. I have no idea where to start.

@clinique for information.

lolodomo commented 6 years ago

Could it be something specific to Zulu JRE? Did you check with the Oracle JRE which is at my knowledge the only recommended JRE for openHAB?

Orfait commented 6 years ago

https://www.openhab.org/docs/installation/#prerequisites

OpenHAB should work with Zulu, Oracle JRE and OpenJDK. But for the last one, docs says we could have compatibility issues.

In the doc, Zulu is the only one which does not have any disadvantage. So, natural choice.

I will try Oracle JRE today.

clinique commented 6 years ago

@lolodomo : no change on my side and for good reason, I quit Free a year ago. So can not help... :(

lolodomo commented 6 years ago

So I become officially the only maintener of this binding.

clinique commented 6 years ago

Sorry, yes. Maybe I'll start the same for Orange ...

Orfait commented 6 years ago

First result : 30% of RAM at the beginning become more than 61% after 3 hours.

So, switch to Oracle JRE does not show progress on RAM usage. I will let a chance for a stabilisation over time. Let's see in 4-5 hours.

Orfait commented 6 years ago

I am now reaching 100% of RAM usage.

How should we do to fix this issue ?

lolodomo commented 5 years ago

And without this binding, you have no increase of memory ?

Orfait commented 5 years ago

When I remove it, memory usage is constant.

lolodomo commented 5 years ago

How do you measure the memory usage ? Which command ?

What things did you setup and with what refresh setting ?

Orfait commented 5 years ago

As OpenHAB is running in proxmox (LXC container), I can see memory usage directly in proxmox. I also confirmed that this memory usage is associated to OpenHAB :

Then, I started to disable addons one by one.

things : Bridge freebox:server:FreeboxServer "Freebox" [ fqdn="mafreebox.freebox.fr:80", useOnlyHttp=true, appToken="xxx", refreshInterval=30 ] { Thing phone telephone "Téléphone fixe" [ refreshPhoneInterval=10, refreshPhoneCallsInterval=10 ] Thing net_device iphone1 "iPhone 1" [ macAddress="XXX" ] Thing net_device iphone2 "iPhone 2" [ macAddress="XXX" ] Thing net_device mobile1 "Téléphone 1" [ macAddress="XXX" ] }

items : String Freebox_Fwversion "Firmware Version" (Freebox) {channel="freebox:server:FreeboxServer:fwversion"} Number Freebox_Uptime "Server uptime" (Freebox) {channel="freebox:server:FreeboxServer:uptime"} Switch Freebox_Restarted "Just restarted" (Freebox) {channel="freebox:server:FreeboxServer:restarted"} Number Freebox_Tempcpum "CPUm Temperature" (Freebox) {channel="freebox:server:FreeboxServer:tempcpum"} Number Freebox_Tempcpub "CPUb Temperature" (Freebox) {channel="freebox:server:FreeboxServer:tempcpub"} Number Freebox_TempSwitch "Switch Temperature" (Freebox) {channel="freebox:server:FreeboxServer:tempSwitch"} Number Freebox_Fanspeed "Fan Speed" (Freebox) {channel="freebox:server:FreeboxServer:fanspeed"} Switch Freebox_Reboot "Reboot Freebox" (Freebox) {channel="freebox:server:FreeboxServer:reboot"} Number Freebox_LcdBrightness "Screen Brightness" (Freebox) {channel="freebox:server:FreeboxServer:lcd_brightness"} Number Freebox_LcdOrientation "Screen Orientation" (Freebox) {channel="freebox:server:FreeboxServer:lcd_orientation"} Switch Freebox_LcdForced "Forced Orientation" (Freebox) {channel="freebox:server:FreeboxServer:lcd_forced"} Switch Freebox_WifiStatus "Wifi Enabled" (Freebox) {channel="freebox:server:FreeboxServer:wifi_status"} Switch Freebox_FtpStatus "FTP Server Enabled" (Freebox) {channel="freebox:server:FreeboxServer:ftp_status"} Switch Freebox_AirmediaStatus "Air Media Enabled" (Freebox) {channel="freebox:server:FreeboxServer:airmedia_status"} Switch Freebox_UpnpavStatus "UPnP AV Enabled" (Freebox) {channel="freebox:server:FreeboxServer:upnpav_status"} Switch Freebox_SambafileshareStatus "Window File Sharing Enabled" (Freebox) {channel="freebox:server:FreeboxServer:sambafileshare_status"} Switch Freebox_SambaprintershareStatus "Window Printer Sharing Enabled" (Freebox) {channel="freebox:server:FreeboxServer:sambaprintershare_status"} String Freebox_XdslStatus "xDSL Status" (Freebox) {channel="freebox:server:FreeboxServer:xdsl_status"} String Freebox_LineStatus "Line Status" (Freebox) {channel="freebox:server:FreeboxServer:line_status"} String Freebox_Ipv4 "IP Address" (Freebox) {channel="freebox:server:FreeboxServer:ipv4"} Number Freebox_RateUp "Upload Rate" (Freebox) {channel="freebox:server:FreeboxServer:rate_up"} Number Freebox_RateDown "Download Rate" (Freebox) {channel="freebox:server:FreeboxServer:rate_down"} Number Freebox_BytesUp "Uploaded" (Freebox) {channel="freebox:server:FreeboxServer:bytes_up"} Number Freebox_BytesDown "Downloaded" (Freebox) {channel="freebox:server:FreeboxServer:bytes_down"}

Switch TelephoneFixe_StateOnhook "State onhook" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:state#onhook"} Switch TelephoneFixe_StateRinging "State ringing" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:state#ringing"} String TelephoneFixe_AnyCallNumber "Any call Number" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:any#call_Number"} Number TelephoneFixe_AnyCallDuration "Any call duration" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:any#call_duration"} DateTime TelephoneFixe_AnyCallTimestamp "Any call timestamp" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:any#call_timestamp"} String TelephoneFixe_AnyCallStatus "Any call status" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:any#call_status"} String TelephoneFixe_AnyCallName "Any call name" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:any#call_name"} String TelephoneFixe_AcceptedCallNumber "Accepted call Number" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:accepted#call_Number"} Number TelephoneFixe_AcceptedCallDuration "Accepted call duration" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:accepted#call_duration"} DateTime TelephoneFixe_AcceptedCallTimestamp "Accepted call timestamp" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:accepted#call_timestamp"} String TelephoneFixe_AcceptedCallName "Accepted call name" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:accepted#call_name"} String TelephoneFixe_MissedCallNumber "Missed call Number" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:missed#call_Number"} Number TelephoneFixe_MissedCallDuration "Missed call duration" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:missed#call_duration"} DateTime TelephoneFixe_MissedCallTimestamp "Missed call timestamp" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:missed#call_timestamp"} String TelephoneFixe_MissedCallName "Missed call name" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:missed#call_name"} String TelephoneFixe_OutgoingCallNumber "Outgoing call Number" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:outgoing#call_Number"} Number TelephoneFixe_OutgoingCallDuration "Outgoing call duration" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:outgoing#call_duration"} DateTime TelephoneFixe_OutgoingCallTimestamp "Outgoing call timestamp" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:outgoing#call_timestamp"} String TelephoneFixe_OutgoingCallName "Outgoing call name" (Freebox) {channel="freebox:phone:FreeboxServer:telephone:outgoing#call_name"}

Switch IPhone1 "Reachable" {channel="freebox:net_device:FreeboxServer:iphone1:reachable"} Switch IPhone2 "Reachable" {channel="freebox:net_device:FreeboxServer:iphone2:reachable"} Switch Telephone1 "Reachable" {channel="freebox:net_device:FreeboxServer:mobile1:reachable"}

lolodomo commented 5 years ago

The binding is requesting some objects from the underlying library. Few of these objects contain or are list of objects. I am not sure if I need to clear the lists after using them or if I should let Java garbage collector deal with that ? Maybe @maggu2810 can answer to this question ?

lolodomo commented 5 years ago

Looking at the code of the binding itself, I cannot find what could lead to a memory leak.

Orfait commented 5 years ago

Nice, I can help for testing.

lolodomo commented 5 years ago

Maybe a problem here in the library ? https://github.com/MatMaul/freeboxos-java/blob/master/src/org/matmaul/freeboxos/internal/RestManager.java#L165 execute gets the response form the HTTP request and opens a stream from its content. This stream is then analyzed by readValue but then the stream is never closed. Can an expert confirm that the stream has to be closed ?

@Orfait : I will build a fixed version of the binding. Are you enough advanced user to deploy a binding jar ?

Orfait commented 5 years ago

I can't build but I can deploy a jar.

lolodomo commented 5 years ago

Please check if this is better with this version. deleted Don't forget unzipping the file first.

Orfait commented 5 years ago

I installed it, let's see in few hours

Edit : memory usage is increasing at same rate as before. 30% -> 50% after 2 hours

Orfait commented 5 years ago

I still get the issue : after a night, memory is at 97% (1,9GB) and swap is at 474MB. So I made a heap dump and removed the binding.

Unfortunately, the Memory Analyzer Tool does not run on my computer. SO for now, heap dump is useless.

martinvw commented 5 years ago

@lolodomo did you ever analyse a heapdump, there is a good chance that it shows the cause. If needed I can take look in the upcoming days.

@orfait do you have the full heap dump? So the the 1 gb + one. AFAIK it is not enabled by default, but it is the that can help us find the cause.

lolodomo commented 5 years ago

What is strange is that the binding is not manipulating massive data. I can't understand how it could consume 2 GB of memory after 2 hours, even with your refresh rate for phone set to 10 seconds.

lolodomo commented 5 years ago

No I never produced or analyzed heapdumps.

Orfait commented 5 years ago

@lolodomo : not 2 hours, grow from 500MB to 2GB takes about 10 hours. But yes, that's fast.

@martinvw : strange, the file produced with dev:dump-create is only 2MB. Is it normal ?

martinvw commented 5 years ago

That sounds more like a threaddump, see also

https://stackoverflow.com/questions/407612/how-to-get-a-thread-and-heap-dump-of-a-java-process-on-windows-thats-not-runnin

AFAIK the command is the same on Linux

lolodomo commented 5 years ago

What is strange too is that I am using this binding in my production environment (RPI 2) since ages and openHAB is still responding well even after several days.

lolodomo commented 5 years ago

The architecture of the binding is very simple. A thread is scheduled every XX seconds. It requessts data from the library and use them to set the state of the channels. These data include list of objects. Nothing is done in the binding to release these objects. I was expecting the memory to be released when the thread ends (or at least after Java Garbage Collector is run). The library executes HTTP requests to get the data and builds objects from the JSON response. List of objects are sometimes built. As these data are delivered to the client requesting them through the return of a library public method, these lists are never removed inside the library itself.

PS: in fact, we have 3 different threads that handle different data.

lolodomo commented 5 years ago

@Orfait : as my fix did not help, it could be better to do your memory dump with the official version of the binding.

lolodomo commented 5 years ago

I could apply a removeAll or clear to all lists of objects returned by the library once I finished to use them. But is it required ?

lolodomo commented 5 years ago

Please try with this new version in which I added some list.clear(). org.openhab.binding.freebox-2.4.0-SNAPSHOT.zip I am curious to know if it helps or not.

lolodomo commented 5 years ago

@Orfait : with your settings, we should have after 10 hours:

That is a total of 22800 HTTP calls. Even if we count 10 KB per call, it leads to 228 MB after 10 hours.

Orfait commented 5 years ago

I deployed the new jar.

In fact, I am running this binding for a long time now, also on a raspberry pi (before current setup). One thing : I was running the snapshot version of OpenHAB, then moved to stable 2.3. But before that, I (wrongly) updated to 2.4 snapshot.

I will also check the rules, but it is the same : rules have not changed since a long time...

lolodomo commented 5 years ago

It is possible that the jar I provided is only working well with openHAB snapshot. I don't know if there was some breaking API in the Eclipse SmartHome core framework since the stable v2.3 of openHAB.

martinvw commented 5 years ago

I could apply a removeAll to all lists of objects returned by the library once I finished to use them. But is it required ?

Normally garbage collection should take care of this (it should remove the whole list and it’s elements) maybe clearing the lists could change how long it takes but most likely the problem lies deeper.

A heap dump has the most value without additional cleaning code because it will make the problem easier to spot in the dump.

lolodomo commented 5 years ago

Note that the library we rely on ( https://github.com/MatMaul/freeboxos-java ) is using these libraries:

Maybe it will not be a bad idea to move to a more recent version of the Apache HTTP components ?

Orfait commented 5 years ago

I am not able to do the heap dump... Tried with -F, but this leads to a java exception.

Strange, the dev:dump-create in karaf is not related to heap dump ? I am pretty sure I have read this in the forum.

EDIT : I must run jmap with same user as java process

martinvw commented 5 years ago

https://karaf.apache.org/manual/latest-2.x/developers-guide/developer-commands.html

It seems to contains karaf specific diagnostics:

lolodomo commented 5 years ago

Consumming 1,5 GB in 10 hours, it will mean around 66 KB of data per HTTP call, all this data never released. That is too big. Even if there are few calls that can return a lot of data (list of phone calls or list of network devices), most of them should probably require less than 1 KB).

lolodomo commented 5 years ago

When I use the top command, I can see the %MEM for my java process is around 32% + this:

KiB Mem:    996452 total,   969404 used,    27048 free,    39780 buffers
KiB Swap:   102396 total,     9216 used,    93180 free.   394224 cached Mem

And the result of "free -m" gives:

             total       used       free     shared    buffers     cached
Mem:           973        946         26          1         38        384
-/+ buffers/cache:        522        450
Swap:           99          9         90

As already discussed in another issue, the used mem displayed by this command is very high but in fact there a lot of cached mem. In my case, 450 MB is in fact available and not only 26 MB.

lolodomo commented 5 years ago

As discussed here https://github.com/eclipse/smarthome/issues/4490 , the right cvommand to use to know the available RAM is: cat /proc/meminfo | grep '^MemAvailable:' | awk '{print $2/1024}' which gives 496.832 in my current case.

lolodomo commented 5 years ago

I just installed again the official version of the binding (not my fixed version). My current value of available memory is 491.309. Will see the value in few hours.

lolodomo commented 5 years ago

20 minutes later: 490.285

Orfait commented 5 years ago

I installed back the original version and removed all other bindings. It is sunny today, we can keep lights off :)

I am waiting until memory is full then make a heap dump.

Available memory (for tracking) Sun Jul 29 15:11:47 CEST 2018 : 1391.57 Sun Jul 29 15:11:53 CEST 2018 : 1386.55 Sun Jul 29 15:12:59 CEST 2018 : 1381.73 Sun Jul 29 15:14:10 CEST 2018 : 1374.26 Sun Jul 29 15:23:21 CEST 2018 : 1338.6 Sun Jul 29 15:33:48 CEST 2018 : 1303.05

lolodomo commented 5 years ago
 cat /proc/meminfo | grep '^MemAvailable:' | awk '{print $2/1024}'
490.918
top - 15:34:23 up 12 days, 14:50,  1 user,  load average: 0,09, 0,19, 0,18
Tasks: 109 total,   1 running, 108 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,9 us,  0,3 sy,  0,0 ni, 98,8 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
KiB Mem:    996452 total,   963920 used,    32532 free,    39828 buffers
KiB Swap:   102396 total,     9204 used,    93192 free.   383736 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
27989 xx        20   0  513492 327504  14620 S   4,0 32,9 365:48.66 java
free -m
             total       used       free     shared    buffers     cached
Mem:           973        941         32          1         38        374
-/+ buffers/cache:        527        445
Swap:           99          8         91
lolodomo commented 5 years ago

Using the command cat /proc/<pid>/status I can see that I have between 182 and 187 threads for my java process. Is it expected to have so many threads ?

VmPeak:   513524 kB
VmSize:   513492 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    327756 kB
VmRSS:    327692 kB
VmData:   493928 kB
VmStk:       136 kB
VmExe:         4 kB
VmLib:      9980 kB
VmPTE:       428 kB
VmPMD:         0 kB
VmSwap:        0 kB
Threads:        185
lolodomo commented 5 years ago
cat /proc/meminfo | grep '^MemAvailable:' | awk '{print $2/1024}'
489.512
top - 17:10:55 up 12 days, 16:26,  1 user,  load average: 0,23, 0,21, 0,18
Tasks: 109 total,   1 running, 108 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,4 us,  0,2 sy,  0,0 ni, 99,3 id,  0,0 wa,  0,0 hi,  0,1 si,  0,0 st
KiB Mem:    996452 total,   967672 used,    28780 free,    40112 buffers
KiB Swap:   102396 total,     9196 used,    93200 free.   384056 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
27989 xx        20   0  513492 327536  14620 S   1,6 32,9 383:58.63 java
free -m
             total       used       free     shared    buffers     cached
Mem:           973        944         28          1         39        375
-/+ buffers/cache:        530        442
Swap:           99          8         91
VmPeak:   513524 kB
VmSize:   513492 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    327828 kB
VmRSS:    327628 kB
VmData:   493928 kB
VmStk:       136 kB
VmExe:         4 kB
VmLib:      9980 kB
VmPTE:       428 kB
VmPMD:         0 kB
VmSwap:        0 kB
Threads:        185
lolodomo commented 5 years ago

It looks very stable at the java process level. The total available memory lost only 2 MB after 4 hours but it might be because of data logs rather than memory leak (even if my openHAB logs and rrdj data are saved on a network share).

At this stage, I doubt there is a memory leak in openHAB. I am running the snapshot 1320 on a RPI 2.

lolodomo commented 5 years ago
cat /proc/meminfo | grep '^MemAvailable:' | awk '{print $2/1024}'
489.594
top - 20:03:55 up 12 days, 19:19,  1 user,  load average: 0,35, 0,26, 0,21
Tasks: 109 total,   1 running, 108 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,4 us,  0,3 sy,  0,0 ni, 99,0 id,  0,1 wa,  0,0 hi,  0,3 si,  0,0 st
KiB Mem:    996452 total,   961480 used,    34972 free,    40340 buffers
KiB Swap:   102396 total,     9188 used,    93208 free.   377620 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
27989 xx        20   0  513492 327544  14620 S   2,6 32,9 417:27.81 java
free -m
             total       used       free     shared    buffers     cached
Mem:           973        942         30          2         39        372
-/+ buffers/cache:        530        442
Swap:           99          8         91
VmPeak:   514052 kB
VmSize:   513492 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    327932 kB
VmRSS:    327608 kB
VmData:   493928 kB
VmStk:       136 kB
VmExe:         4 kB
VmLib:      9980 kB
VmPTE:       428 kB
VmPMD:         0 kB
VmSwap:        0 kB
Threads:        190

Still stable. Just a little more threads.

Orfait commented 5 years ago

I restarted after re-adding other bindings (need to have lights in the evening... After 5 hours :

cat /proc/meminfo | grep '^MemAvailable:' | awk '{print $2/1024}'
390.324
top - 21:22:08 up  4:51,  1 user,  load average: 0.01, 0.07, 0.17
Tasks:  24 total,   1 running,  23 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.2 us,  0.5 sy,  0.0 ni, 98.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2097152 total,   190752 free,  1693592 used,   212808 buff/cache
KiB Swap:  3145728 total,  3145728 free,        0 used.   403560 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  136 openhab   20   0 6013528 1.474g  19748 S   2.0 73.7  12:21.02 java
free -m
              total        used        free      shared  buff/cache   available
Mem:           2048        1657         182          81         207         390
Swap:          3072           0        3072
cat /proc/136/status
Name:   java
Umask:  0022
State:  S (sleeping)
Tgid:   136
Ngid:   0
Pid:    136
PPid:   1
TracerPid:      0
Uid:    108     108     108     108
Gid:    114     114     114     114
FDSize: 512
Groups: 114
NStgid: 136
NSpid:  136
NSpgid: 136
NSsid:  136
VmPeak:  6013536 kB
VmSize:  6013528 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:   1554852 kB
VmRSS:   1554784 kB
RssAnon:         1535036 kB
RssFile:           19748 kB
RssShmem:              0 kB
VmData:  1961980 kB
VmStk:       132 kB
VmExe:         4 kB
VmLib:     17984 kB
VmPTE:      4084 kB
VmSwap:        0 kB
HugetlbPages:          0 kB
CoreDumping:    0
Threads:        348
SigQ:   0/31004
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 2000000181005ccf
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003cfdfcffff
CapAmb: 0000000000000000
NoNewPrivs:     0
Seccomp:        2
Speculation_Store_Bypass:       vulnerable
Cpus_allowed:   5
Cpus_allowed_list:      0,2
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:      0
voluntary_ctxt_switches:        60
nonvoluntary_ctxt_switches:     13
lolodomo commented 5 years ago

That is weird that your Java process required 6 GB of memory !!! How openHAB could require so much memory while mine only requires 500 MB ? Did you change the default startup settings ? Maybe you have a problem with one of your bindings but not Freebox.