Closed chibenwa closed 6 months ago
I tried a quick diagnostic but if we do not timely succeed to do that using LeakAware might be needed!
Spotted again today
-rw------- 1 root root 20M Feb 27 14:45 aesencrypt1090356846759433124.tmp
-rw------- 1 root root 1.1M Feb 27 14:45 aesencrypt12150808671220843742.tmp
-rw------- 1 root root 190K Feb 27 15:13 aesencrypt12270433280743716700.tmp
-rw------- 1 root root 308K Feb 27 15:13 aesencrypt14012496583013195642.tmp
-rw------- 1 root root 1.6M Feb 27 14:45 aesencrypt14207565799113878226.tmp
-rw------- 1 root root 819K Feb 27 15:13 aesencrypt16234583170132255200.tmp
-rw------- 1 root root 29M Feb 27 14:45 aesencrypt17148122387552507888.tmp
-rw------- 1 root root 1.2M Feb 27 14:45 aesencrypt18202933902701442902.tmp
-rw------- 1 root root 447K Feb 27 14:45 aesencrypt3445986306924536171.tmp
-rw------- 1 root root 432K Feb 27 15:13 aesencrypt4454941668781397544.tmp
-rw------- 1 root root 844K Feb 27 15:13 aesencrypt7032727805399581833.tmp
-rw------- 1 root root 1.1M Feb 27 14:45 aesencrypt7142968592193439328.tmp
-rw------- 1 root root 1.5M Feb 27 14:45 aesencrypt729192674429865014.tmp
-rw------- 1 root root 7.9M Feb 27 14:45 FileBufferedBodyFactory12196286621053298905.tmp
-rw------- 1 root root 147K Feb 27 14:45 FileBufferedBodyFactory1286078198665713980.tmp
-rw------- 1 root root 1.1M Feb 27 14:45 FileBufferedBodyFactory14551617268047995378.tmp
-rw------- 1 root root 511K Feb 27 14:45 FileBufferedBodyFactory16413942727464907789.tmp
-rw------- 1 root root 27M Feb 27 14:45 FileBufferedBodyFactory1818233457731727059.tmp
-rw------- 1 root root 1.5M Feb 27 14:45 FileBufferedBodyFactory3831952595233933979.tmp
-rw------- 1 root root 446K Feb 27 14:45 FileBufferedBodyFactory5940323976140619987.tmp
-rw------- 1 root root 146K Feb 27 14:45 FileBufferedBodyFactory6010870541582569665.tmp
-rw------- 1 root root 1.2M Feb 27 14:45 FileBufferedBodyFactory660729112906107165.tmp
-rw------- 1 root root 2.7M Feb 27 14:45 FileBufferedBodyFactory8039137499703867154.tmp
-rw------- 1 root root 2.7M Feb 27 14:45 FileBufferedBodyFactory8138595856361894791.tmp
drwxr-xr-x 2 root root 60 Feb 27 14:45 hsperfdata_root
-rw------- 1 root root 845K Feb 27 15:13 imap-literal10068127184381344361.tmp
-rw------- 1 root root 29M Feb 27 14:45 imap-literal1157096732197034832.tmp
-rw------- 1 root root 15M Feb 27 14:45 imap-literal13899523218875244485.tmp
-rw------- 1 root root 310K Feb 27 15:13 imap-literal14011594112211787081.tmp
-rw------- 1 root root 816K Feb 27 15:13 imap-literal14036192792378682392.tmp
-rw------- 1 root root 7.5M Feb 27 15:13 imap-literal14048239475853386882.tmp
-rw------- 1 root root 6.4M Feb 27 14:45 imap-literal18314332679400082689.tmp
-rw------- 1 root root 7.9M Feb 27 14:45 imap-literal18397798873854138783.tmp
-rw------- 1 root root 55M Feb 27 14:45 imap-literal3133932598444182451.tmp
-rw------- 1 root root 431K Feb 27 15:13 imap-literal4162503995144712492.tmp
-rw------- 1 root root 1.1M Feb 27 14:45 imap-literal5724799145470707674.tmp
-rw------- 1 root root 200K Feb 27 15:13 imap-literal592552489419855185.tmp
-rw------- 1 root root 1.5M Feb 27 14:45 imap-literal6784841676601606534.tmp
-rw------- 1 root root 40K Feb 27 15:13 imap-literal703355375567653488.tmp
-rw------- 1 root root 12M Feb 27 14:45 imap-literal7265937497129667509.tmp
I was able to corelate the hours with a POD restart - killed by Over memory consumption by the K8s host.
1GK for netty off heap buffer, JVM memory structures and TMPFS is too little. I added 500 more MBs. That's for the OOM part.
However K8s, even with pre-stop, did not cleanup the files in /tmp
making the issue worse. I suspect the OOM did not run the pre-stop because it hard killed the POC.
^^
On some JMAP pods
ANd in the logs:
And
Those are corner cases as this very rarely happen.
As far as I could see it looks
DeletedMessageVaultCallback
was the triggering event.The damn thing runs with a 60s timeout (thanks rabbit) which might actually lead to cancelation.
Publisher cancelation might not be handled by some parts of our pipelines.