yamcs / yamcs-studio

Desktop TM/TC Client for Yamcs
https://docs.yamcs.org/yamcs-studio/
Eclipse Public License 2.0
27 stars 14 forks source link

Yamcs Studio 1.5.9 crashing by itself after time #113

Open nmaas87 opened 2 years ago

nmaas87 commented 2 years ago

Hi, I got an instance of Yamcs Studio 1.5.9 running on a current Lubuntu LTS (20.04) VM. As this instance is for remote monitoring and kept running for days it is not really been touched a lot, but just kept connected to the server to show telemetry. I realizied now its crashing a lot, sometimes after hours, sometimes after a day. I cannot say how long its actually running before this occurs, but without any user interaction it will at some point just crash and close itself and by itself. I tried running it from command line - but it does also crash there without leaving any debug messages after the initial connect ones. How to debug this problem / enable debug messages and find out what causes this issue?

nmaas87 commented 2 years ago

It looks like this problem is more frequent if you put the runner on 90% zoom.

nmaas87 commented 2 years ago

OK correction, it does crash regardless of zoom level. I did crash now after 5 hours with normal/100% zoom. And I get no debugging info whatsoever. Could you give me some info on how to debug? I guess having a hard crash of an OPS Tool during Operations will be bad.

nmaas87 commented 2 years ago

Any info would be good, this is still an issue, also with older versions.

fqqb commented 2 years ago

Could be memory-related. You could use VisualVM or similar while the application is running, to see how it behaves.

Or add the following in your ini file (beneath -vmargs, one argument per line) and then analyze the dump afterwards.

-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/tmp/
nmaas87 commented 2 years ago

Thanks Fabian! I changed the yamcs-display:~/yamcs-studio-1.5.9$ cat Yamcs\ Studio.ini to following:

-startup
plugins/org.eclipse.equinox.launcher_1.6.0.v20200915-1508.jar
--launcher.library
plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.2.0.v20200915-1442
-vm
plugins/org.eclipse.justj.openjdk.hotspot.jre.full.stripped.linux.x86_64_11.0.2.v20200815-0835/jre/bin
-vmargs
-Xmx2048m
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/tmp/
-Declipse.p2.unsignedPolicy=allow
-Duser.timezone=GMT
-Dosgi.requiredJavaVersion=11
-Dorg.eclipse.update.reconcile=false
--add-modules=ALL-SYSTEM

I will it run (and catch fire) and see when it crashes. I guess it will create some HeapDump in /tmp/ on crash and then I can see those to you - or how can I analyze them?

Thanks a lot!

nmaas87 commented 2 years ago

Dear @fqqb - you were spot on! Yamcs Studio seems to suffer from a Memory Leak. I could see it grow within 8 hours from 707,85 MB reserved RAM to 1367,61 MB reserved RAM - and it was then killed by the system, as it was severly running out of RAM.

Before Yamcs Studio got closed:

              total        used        free      shared  buff/cache   available
Mem:          1,9Gi       1,8Gi        51Mi        28Mi        70Mi        17Mi

After it got closed:

              total        used        free      shared  buff/cache   available
Mem:          1,9Gi       522Mi       1,3Gi        33Mi       162Mi       1,3Gi

For the moment / to safekeep the mission which will need us to have Yamcs Studio run in excess of 10 hours I increased the amount of RAM to 6 GB and let it run now to see where we will end up (but I have to keep in mind that it could crash hours later when it hits the 2048MB limit... So I have to see if I need to increase this also).

I hope this error gets resolved, sadly Yamcs Studio/Java is killed so hard by the OS that it will not write anything to the created files in /tmp, so no debug available.

nmaas87 commented 2 years ago

Additional note: Within the first hours (when there was more than enough RAM available), Yamcs Studio was consuming RAM in the rate of 1.425 MB / Minute without receiving any TM data.

xpromache commented 2 years ago

Can you run jmap -histo:live when memory is growing to check if there are some specific classes with the number of objects increasing rapidly?

Do you have scripts in displays? I imagine some of those could cause this problem. Maybe you can try closing all displays and opening one by one to see which one is consuming memory.

fqqb commented 2 years ago

Wait, your system has only 2 GB available? As per Xmx setting in the ini file, Yamcs Studio's heap is allowed to grow to 2GB. Either lower that, or indeed add make more memory available to your system.

nmaas87 commented 2 years ago

Yeah @fqqb - I thought Yamcs was normally configured for Java default (512 MB) - but saw the 2G line first when I set the debugging info. I am currently trying this on 6 GB of RAM, just to be sure and see how it goes/if it crashes again and then when. I will not have more time for debugging because the mission is coming close and "good enough" works / if I can make it work for enough time. I am using only one display at all - but I am using Javascript and Python scripts in it.

nmaas87 commented 2 years ago

Short headsup: I started the run yesterday at 08:10 with 6 GB of RAM @ 13:24,

MiB Mem :   5938,4 total,   3399,6 free,   1587,0 used,    951,8 buff/cache
MiB Swap:      0,0 total,      0,0 free,      0,0 used.   4098,7 avail Mem
   1261 user      20   0  906956 142684  41532 R  85,4   2,3 267:18.09 ffmpeg
   1329 user      20   0 6512104   1,1g  69480 S   3,3  18,6  38:39.65 java

today @ 08:41 / 24 hours later

MiB Mem :   5938,4 total,   1752,9 free,   3168,5 used,   1017,0 buff/cache
MiB Swap:      0,0 total,      0,0 free,      0,0 used.   2512,2 avail Mem
   1261 user      20   0  906956 143168  41760 S  86,7   2,4   1282:22 ffmpeg
   1329 user      20   0 8125932   2,6g  69552 S  27,6  45,3 291:36.02 java

So Yamcs has already risen above the 2048 MB limit in terms of its reserved memory. I will still let it run and see if and when it crashes (I think I really need to set up latest on saturday for real testing of the overall ground segment, until then it can still run and gather data). 24+ hours is ok in my usecase, but with this data I think there is an indication that there is a memory leak.

nmaas87 commented 2 years ago

Ok, in the end it filled up all memory again (6 GB of RAM) and then hardcrashed, so its sadly not because of just having not enough RAM. Here is some memory logging:


new try with 6 GB RAM
2021-11-17 Wednesday 08:10

2021-11-17 13:24:11
MiB Mem :   5938,4 total,   3399,6 free,   1587,0 used,    951,8 buff/cache
MiB Swap:      0,0 total,      0,0 free,      0,0 used.   4098,7 avail Mem
   1261 user      20   0  906956 142684  41532 R  85,4   2,3 267:18.09 ffmpeg
   1329 user      20   0 6512104   1,1g  69480 S   3,3  18,6  38:39.65 java

2021-11-18 Thursday 08:41:13
MiB Mem :   5938,4 total,   1752,9 free,   3168,5 used,   1017,0 buff/cache
MiB Swap:      0,0 total,      0,0 free,      0,0 used.   2512,2 avail Mem
   1261 user      20   0  906956 143168  41760 S  86,7   2,4   1282:22 ffmpeg
   1329 user      20   0 8125932   2,6g  69552 S  27,6  45,3 291:36.02 java

2021-11-18 16:31:02
MiB Mem :   5938,4 total,   1090,1 free,   3823,7 used,   1024,7 buff/cache
MiB Swap:      0,0 total,      0,0 free,      0,0 used.   1856,8 avail Mem
   1261 user      20   0  906956 143168  41760 R  86,4   2,4   1688:27 ffmpeg
   1329 user      20   0 8822252   3,3g  69552 S  31,2  56,2 432:36.00 java

2021-11-19 Friday 17:00:00
              total        used        free      shared  buff/cache   available
Mem:          5,8Gi       5,6Gi       110Mi        20Mi       106Mi        27Mi
Swap:            0B          0B          0B
user        1329 23.1 87.3 10755564 5312172 ?    Sl   Nov17 788:29 /home/user/yamcs-studio-1.5.9//plugins/org.eclipse.justj.openjdk.hotspot.jre.full.stripped.linux.x86_64_11.0.2.v20200815-0835/jre/bin/java -Xmx2048m -Declipse.p2.unsignedPolicy=allow -Duser.timezone=GMT -Dosgi.requiredJavaVersion=11 -Dorg.eclipse.update.reconcile=false --add-modules=ALL-SYSTEM -jar /home/user/yamcs-studio-1.5.9//plugins/org.eclipse.equinox.launcher_1.6.0.v20200915-1508.jar -os linux -ws gtk -arch x86_64 -showsplash -launcher /home/user/yamcs-studio-1.5.9/Yamcs Studio -name Yamcs Studio --launcher.library /home/user/yamcs-studio-1.5.9//plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.2.0.v20200915-1442/eclipse_11201.so -startup /home/user/yamcs-studio-1.5.9//plugins/org.eclipse.equinox.launcher_1.6.0.v20200915-1508.jar --launcher.overrideVmargs -exitdata 13 -vm /home/user/yamcs-studio-1.5.9//plugins/org.eclipse.justj.openjdk.hotspot.jre.full.stripped.linux.x86_64_11.0.2.v20200815-0835/jre/bin/java -vmargs -Xmx2048m -Declipse.p2.unsignedPolicy=allow -Duser.timezone=GMT -Dosgi.requiredJavaVersion=11 -Dorg.eclipse.update.reconcile=false --add-modules=ALL-SYSTEM -jar /home/user/yamcs-studio-1.5.9//plugins/org.eclipse.equinox.launcher_1.6.0.v20200915-1508.jar
user        1261 85.2  1.8 906956 112944 pts/0   RLl+ Nov17 2906:56 ffmpeg

Fr 19. Nov 17:10:02 CET 2021
              total        used        free      shared  buff/cache   available
Mem:          5,8Gi       5,6Gi       116Mi        20Mi        88Mi        24Mi
Swap:            0B          0B          0B
user        1329 23.1 87.5 10755564 5320868 ?    Sl   Nov17 791:24 /home/user/yamcs-studio-1.5.9//plugins/org.eclipse.justj.openjdk.hotspot.jre.full.stripped.linux.x86_64_11.0.2.v20200815-0835/jre/bin/java -Xmx2048m -Declipse.p2.unsignedPolicy=allow -Duser.timezone=GMT -Dosgi.requiredJavaVersion=11 -Dorg.eclipse.update.reconcile=false --add-modules=ALL-SYSTEM -jar /home/user/yamcs-studio-1.5.9//plugins/org.eclipse.equinox.launcher_1.6.0.v20200915-1508.jar -os linux -ws gtk -arch x86_64 -showsplash -launcher /home/user/yamcs-studio-1.5.9/Yamcs Studio -name Yamcs Studio --launcher.library /home/user/yamcs-studio-1.5.9//plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.2.0.v20200915-1442/eclipse_11201.so -startup /home/user/yamcs-studio-1.5.9//plugins/org.eclipse.equinox.launcher_1.6.0.v20200915-1508.jar --launcher.overrideVmargs -exitdata 13 -vm /home/user/yamcs-studio-1.5.9//plugins/org.eclipse.justj.openjdk.hotspot.jre.full.stripped.linux.x86_64_11.0.2.v20200815-0835/jre/bin/java -vmargs -Xmx2048m -Declipse.p2.unsignedPolicy=allow -Duser.timezone=GMT -Dosgi.requiredJavaVersion=11 -Dorg.eclipse.update.reconcile=false --add-modules=ALL-SYSTEM -jar /home/user/yamcs-studio-1.5.9//plugins/org.eclipse.equinox.launcher_1.6.0.v20200915-1508.jar
user        1261 85.2  1.8 906956 112572 pts/0   SLl+ Nov17 2915:07 ffmpeg

Fr 19. Nov 17:22:18 CET 2021
              total        used        free      shared  buff/cache   available
Mem:          5,8Gi       581Mi       5,1Gi        30Mi       129Mi       5,0Gi
Swap:            0B          0B          0B
user        1261 85.0  2.2 906956 136084 pts/0   RLl+ Nov17 2918:11 ffmpeg

I guess for my usecase its ok at the moment and I will just need to restart Yamcs Studio before going into each DryRun, Testcountdown and the real one - to avoid losing the MCS during flight.

Spaceless007 commented 1 year ago

Hi,

I'm having a similar issue with Yamcs Studio. Was any work done towards the goal of dealing with memory leaks?

@fqqb

fqqb commented 1 year ago

@Spaceless007 I would like a heap dump so that I can investigate what is occupying memory. Do you have jmap available, that's a tool that comes with any Java JDK. Then when you notice things are about to go wrong, but before studio crashes, take a dump referencing the PID of Yamcs Studio:

jmap -dump:format=b,file=heap_dump.hprof <PID>

And please share that hprof file (or in private: fdi AT spaceapplications.com )

In absence of a dump, I'll do some tests myself next week in an attempt to reproduce any issue (i'm not currently aware of any).

unlikelyzero commented 1 year ago

We're seeing this ourselves when using a large MDB. Is there an expectation that this can be fixed with changes on the end user or is it seen as memory management problem which can be fixed?

fqqb commented 1 year ago

@unlikelyzero are you able to isolate a cause on your side that eventually causes a crash? I somehow doubt a large MDB is the reason. Maybe a specific platform, display, or widget :-/

Of course I'd like to fix any problems of this kind, however I ran Yamcs Studio for weeks on end connected to a data source, and all was working well... Also I was given some heap dumps from just before a crash, and also those were healthy, suggesting memory is consumed off-heap for whatever reason.