Docker container excessive resource usage

acseven commented 6 years ago

Hi,

I'm using your package in a DS412+ with 2Gb of RAM. Before then I had my round of issues using patters' SPK package and running this on Docker has been, at least from from the CrashPlan application point of view, pretty much better overall.

However, ever since running the Docker container, I have been having a lot of difficulty in reaching my Synology services in general. At the worst moments DSM would be simply unreachable, returning network time outs. Other services, which are normally instantaneous or pretty responsive (Domoticz, Plex), were sluggish, if actually responsive. THe solutions for this were two: wait (sometimes a few hours) or shutdown the box (gracefully would take a good half an hour to do so).

Using DSM's Resource Monitor, the behaviour is very self-evident:

scrn

The screenshot above shows the moment I turned off the Docker container (took me a good few minutes to actually being able to see Resource Monitor). After that DSM returned to being fully responsive.

Regarding CrashPlan Pro: I'm using it with two file sets for a total of about 2.5Tb and maybe 400k files, both scheduled to run only from 1AM to 8AM. The issue is most evident in the morning, after 8AM, although it persists throughout the day (the screenshot is at this time, when CrashPlan is not backing up). Knowing very well that these kind of issues are most of the times related to the java heap maximum size, I have been trying to fiddle with for the last few weeks. For reference, using patters' SPK with maximum heap of ~1500Mb would suffice both for backup and re-indexing.

Using this Docker implementation I have tried setting the Java environ variable to 1200, 1600 and 1800 Mb and none changed my original issue of system unresponsiveness. I would of course get warnings within CrashPlan itself of it having reached peak memory (lowest numbers) at some point in time. My take so far is that even with the java limit in place, the container is abusing resource usage in a very excessive way, such a way that I had never experienced with the SPK. The SPK would crash - but right now I'm thinking that would probably be preferable than having all the drives working this hard and not having access to the box, something critical during a work day.

From what I've read online, the kernel used by [at least my] Synology is not able to limit resource usage and so there is not point in setting limits in the actual Docker app, because those will not work (they don't, I have tested). Is this something like Docker using swap memory to make up for not having enough RAM available?

Is there anything I might be missing in my config?

jlesage commented 6 years ago

So if I understand correctly, you are saying that the Java memory limit is respected, but you suspect that the container is using additional memory?

Can you provide the output of the following command, which will give an indication of the memory consumed by processes ran by the container.

docker exec crashplan-pro ps -A

jlesage commented 6 years ago

By the way, since everything in the container is running natively on the host, I'm not expecting the docker version to have much more overhead compared to the platter's package. Sure there is some overhead due to additional processes running in the container, but memory consumed by them should not represent a big proportion.

acseven commented 6 years ago

Hi, thanks for the reply.

Here's the requested output:

# docker exec crashplan-pro ps -A
PID   USER     TIME   COMMAND
    1 root       0:00 s6-svscan -s -t0 /var/run/s6/services
   35 root       0:00 s6-supervise s6-fdholderd
  530 root       0:00 s6-supervise statusmonitor
  531 root       0:00 s6-supervise logmonitor
  532 root       0:00 s6-supervise certsmonitor
  533 root       0:00 s6-supervise CrashPlanEngine
  534 root       0:00 s6-supervise x11vnc
  535 root       0:00 s6-supervise app
  536 root       0:00 s6-supervise openbox
  537 root       0:00 s6-supervise xvfb
  538 root       0:00 s6-supervise nginx
  681 daemon     0:00 s6-fdholderd -1 -i rules
  688 root       0:00 /usr/bin/Xvfb :0 -screen 0 1280x1024x24
  705 root       0:00 [s6-notifyonchec]
  721 root       0:00 /usr/bin/openbox
  737 root       0:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf -g daemon off;
  748 root       0:00 /usr/bin/x11vnc -display :0 -rfbport 5900 -rfbportv6 -1 -no6 -noipv6 -httpportv6 -1 -forever -desktop CrashPlan for Small Business -cursor arrow -shared -nopw -stunnel /config/certs/vnc-server.pem
  758 root       0:00 sh ./certsmonitor
  768 root       0:00 forstdin -d    -- LINE fdmove 0 3 importas -u LINE LINE pipeline  s6-ls  -0  --  /etc/logmonitor/notifications.d  pipeline  s6-sort  -0  --  forstdin -o 0 -0 -- i importas -u i i foreground  /etc/logmonitor/notifications.d/${i}/filter  ${LINE}  importas -u ? ? if -t  s6-test  ${?}  -eq  0  pipeline  s6-ls  -0  --  /etc/logmonitor/targets.d  pipeline  s6-sort  -0  --  forstdin -o 0 -0 -- j importas -u j j if  if  -t   s6-test   -f   /var/run/logmonitor/states/${j}.${i}   -a   -f   /etc/logmonitor/targets.d/${j}/debouncing    backtick  -n  -i  DEBOUNCING   s6-head   -n1   /etc/logmonitor/targets.d/${j}/debouncing    importas  -u  DEBOUNCING  DEBOUNCING  ifelse   s6-test   ${DEBOUNCING}   -eq   0     s6-false    backtick  -n  -i  CURRENT_TIME   date   +%s    importas  -u  CURRENT_TIME  CURRENT_TIME  backtick  -n  -i  FILE_TIME   date   +%s   -r   /var/run/logmonitor/states/${j}.${i}    importas  -u  FILE_TIME  FILE_TIME  backtick  -n  -i  TIME_DIFF   s6-expr   ${CURRENT_TIME}   -   ${FILE_TIME}    importas  -u  TIME_DIFF  TIME_DIFF  s6-test  !  ${TIME_DIFF}  -lt  ${DEBOUNCING}  backtick -n -D Unknown title TITLE  ifelse   s6-test   -x   /etc/logmonitor/notifications.d/${i}/title     /etc/logmonitor/notifications.d/${i}/title   ${LINE}    s6-head  -n1  /etc/logmonitor/notifications.d/${i}/title  importas -u TITLE TITLE backtick -n -D Unknown description DESC  ifelse   s6-test   -x   /etc/logmonitor/notifications.d/${i}/desc     /etc/logmonitor/notifications.d/${i}/desc   ${LINE}    s6-head  -n1  /etc/logmonitor/notifications.d/${i}/desc  importas -u DESC DESC backtick -n -D ERROR LEVEL  ifelse   s6-test   -x   /etc/logmonitor/notifications.d/${i}/level     /etc/logmonitor/notifications.d/${i}/level   ${LINE}    s6-head  -n1  /etc/logmonitor/notifications.d/${i}/level  importas -u LEVEL LEVEL background  /etc/logmonitor/targets.d/${j}/send  ${TITLE}  ${DESC}  ${LEVEL}  foreground  s6-rmrf  /var/run/logmonitor/states/${j}.${i}  s6-touch /var/run/logmonitor/states/${j}.${i}
  779 root       0:00 forstdin -d    -- LINE fdmove 0 3 importas -u LINE LINE pipeline  s6-ls  -0  --  /etc/logmonitor/notifications.d  pipeline  s6-sort  -0  --  forstdin -o 0 -0 -- i importas -u i i foreground  /etc/logmonitor/notifications.d/${i}/filter  ${LINE}  importas -u ? ? if -t  s6-test  ${?}  -eq  0  pipeline  s6-ls  -0  --  /etc/logmonitor/targets.d  pipeline  s6-sort  -0  --  forstdin -o 0 -0 -- j importas -u j j if  if  -t   s6-test   -f   /var/run/logmonitor/states/${j}.${i}   -a   -f   /etc/logmonitor/targets.d/${j}/debouncing    backtick  -n  -i  DEBOUNCING   s6-head   -n1   /etc/logmonitor/targets.d/${j}/debouncing    importas  -u  DEBOUNCING  DEBOUNCING  ifelse   s6-test   ${DEBOUNCING}   -eq   0     s6-false    backtick  -n  -i  CURRENT_TIME   date   +%s    importas  -u  CURRENT_TIME  CURRENT_TIME  backtick  -n  -i  FILE_TIME   date   +%s   -r   /var/run/logmonitor/states/${j}.${i}    importas  -u  FILE_TIME  FILE_TIME  backtick  -n  -i  TIME_DIFF   s6-expr   ${CURRENT_TIME}   -   ${FILE_TIME}    importas  -u  TIME_DIFF  TIME_DIFF  s6-test  !  ${TIME_DIFF}  -lt  ${DEBOUNCING}  backtick -n -D Unknown title TITLE  ifelse   s6-test   -x   /etc/logmonitor/notifications.d/${i}/title     /etc/logmonitor/notifications.d/${i}/title   ${LINE}    s6-head  -n1  /etc/logmonitor/notifications.d/${i}/title  importas -u TITLE TITLE backtick -n -D Unknown description DESC  ifelse   s6-test   -x   /etc/logmonitor/notifications.d/${i}/desc     /etc/logmonitor/notifications.d/${i}/desc   ${LINE}    s6-head  -n1  /etc/logmonitor/notifications.d/${i}/desc  importas -u DESC DESC backtick -n -D ERROR LEVEL  ifelse   s6-test   -x   /etc/logmonitor/notifications.d/${i}/level     /etc/logmonitor/notifications.d/${i}/level   ${LINE}    s6-head  -n1  /etc/logmonitor/notifications.d/${i}/level  importas -u LEVEL LEVEL background  /etc/logmonitor/targets.d/${j}/send  ${TITLE}  ${DESC}  ${LEVEL}  foreground  s6-rmrf  /var/run/logmonitor/states/${j}.${i}  s6-touch /var/run/logmonitor/states/${j}.${i}
  793 root       1:44 /usr/local/crashplan/jre/bin/java -Dfile.encoding=UTF-8 -Dapp=CrashPlanService -DappBaseName=CrashPlan -Xms20m -Xmx1550M -Dsun.net.inetaddr.ttl=300 -Dnetworkaddress.cache.ttl=300 -Dsun.net.inetaddr.negative.ttl=0 -Dnetworkaddress.cache.negative.ttl=0 -Dc42.native.md5.enabled=false -classpath /usr/local/crashplan/lib/com.backup42.desktop.jar:/usr/local/crashplan/lang:/usr/local/crashplan com.backup42.service.CPService
  808 root       0:00 /bin/s6-notifyoncheck -n 0 s6-setuidgid 0:0 /usr/local/crashplan/bin/startCrashPlanEngine.sh
  810 root       0:00 s6-ftrigrd
  840 root       0:00 tail -n0 -F /config/log/service.log.0
  841 root       0:00 /usr/bin/stunnel -fd 3
  842 nginx      0:00 nginx: worker process
  843 root       0:00 {tailstatusfile} /bin/sh /usr/bin/tailstatusfile /config/log/app.log
  844 nginx      0:00 nginx: worker process
 6809 root       0:00 sh /startapp.sh
 6881 root       0:08 /usr/local/crashplan/electron/crashplan
 7934 root       0:00 sleep 5
 7948 root       0:00 sleep 5
 7960 root       0:00 ps -A
10300 root       0:00 /usr/local/crashplan/electron/crashplan --type=zygote --no-sandbox
13232 root       0:16 /usr/local/crashplan/electron/crashplan --type=renderer --no-sandbox --primordial-pipe-token=B1299A2EED114510016D550A89D810A6 --lang=en-US --app-path=/usr/local/crashplan/electron/resources/app.asar --node-integration=true --hidden-page --enable-pinch --num-raster-threads=2 --enable-main-frame-before-activation --content-image-texture-target=0,0,3553;0,1,3553;0,2,3553;0,3,3553;0,4,3553;0,5,3553;0,6,3553;0,7,3553;0,8,3553;0,9,3553;0,10,3553;0,11,3553;0,12,3553;0,13,3553;0,14,3553;0,15,3553;1,0,3553;1,1,3553;1,2,3553;1,3,3553;1,4,3553;1,5,3553;1,6,3553;1,7,3553;1,8,3553;1,9,3553;1,10,3553;1,11,3553;1,12,3553;1,13,3553;1,14,3553;1,15,3553;2,0,3553;2,1,3553;2,2,3553;2,3,3553;2,4,3553;2,5,3553;2,6,3553;2,7,3553;2,8,3553;2,9,3553;2,10,3553;2,11,3553;2,12,3553;2,13,3553;2,14,3553;2,15,3553;3,0,3553;3,1,3553;3,2,3553;3,3,3553;3,4,3553;3,

So if I understand correctly, you are saying that the Java memory limit is respected, but you suspect that the container is using additional memory?

I don't think it's using additional RAM, at least that's not what I'm seeing the resource monitor, the issue seems to be swap memory related, given that all the unresponsiveness links to high disk usage. That's why I tried pushing the java mx limit as high as I could, to see if it would improve and it didn't (although it might cause DSM to become sluggish on its own).

acseven commented 6 years ago

Just for reference, I started the container to grab the output and then took a few screenshots of the Resource Monitor while still possible.

At this time it was mostly responsive but starting to get sluggish:

scrn

scrn2

At this time DSM had stopped responding for a while:

And right now it's on its own. No services are responding, including access via SSH and DSM.

jlesage commented 6 years ago

Swap usage could explain the issues you are seeing. It seems that there is a way in Synology to see swap usage: https://www.synology.com/en-us/knowledgebase/DSM/help/DSM/ResourceMonitor/rsrcmonitor_performance. You can confirm with that if swap is highly used.

In the task manager, you can sort processes by memory usage. I would use that to see which ones are consuming the most. But it seems that 2GB is very small for the amount of data you have to backup. CrashPlan alone would need the whole 2GB (https://support.code42.com/CrashPlan/6/Troubleshooting/Adjust_Code42_app_settings_for_memory_usage_with_large_backups).

If you set the java memory limit back to 1GB, do you still have the responsiveness issue?

According to your first screenshot, when the container is stopped, about 30% of the memory is consumed. So about 1.4GB is available to CrashPlan. Having a java memory limit higher than that would cause the system to start using swap.

acseven commented 6 years ago

I'm looking into increasing the RAM to 4Gb, which - if possible - should make the issue disappear.

In the meantime, I'll take a look at your pointers on swap memory, thanks for that. I'm fairly aware of CrashPlan's limitations with RAM (as I mentioned in the OP), but I essentially wrote because the behaviour of the container is different from patters's SPK: the latter would simply crash and the NAS would keep on working, whereas the docker implementation just keeps on going, making everything stop responding.

From my experience CrashPlan doesn't actually require a lot of RAM to run while uploading - it hogs down when/after it scans a data set, being it the scheduled scan or a server sync maintenance task. That's when it spikes. Considering the latter is not very frequent, this means CrashPlan would run, for the most part, until a scheduled scan begins. At that point the system would either have available resources or the service would crash.

That's the main thing; I would prefer that CrashPlan would crash due to insufficient resources at that point, keeping the remaining system functionality fully available. The current behaviour is just unusable, as I literally do not have any access to the NAS while it tries to figure out what to do with CrashPlan.

jlesage commented 6 years ago

Keeping the Java limit low (1GB for example) should make CrashPlan engine to crash, like with patters's SPK. Is it the case? Also, with patters's SPK, did you have the UI running all the time? Maybe this is something that can make a difference.

acseven commented 6 years ago

Keeping the Java limit low (1GB for example) should make CrashPlan engine to crash, like with patters's SPK. Is it the case?

I'll try, as soon as I can. The thing with 1Gb is that I am pretty sure that with that amount of available RAM it won't be able to do much, including uploads.

Also, with patters's SPK, did you have the UI running all the time? Maybe this is something that can make a difference.`

No, far from it, the UI was open once every few weeks.

bigtfromaz commented 6 years ago

I know of at least 3 DS412+ machines with 2GB that ran CrashPlan just fine, inside and outside a Docker container. How much memory is consumed in steady state when your container is not running?

If you are using <= 700MB in steady state give the container 1.3 GB. Do not let it swap. If you are using more than that, then you need more memory than 2GB.

Also, my recollection is that a number users reported performance issues when they pushed the DS412+ to 4GB. The hardware supports it, but my guess is there are some kernel parameters that DSM is/was setting that required tuning. You may figure that out but you'll be adjusting them with every upgrade.

When we reached our cap, rather than dealing with kernel tuning, we sold the DS412+ machines on Ebay and replaced them with DS918+ and added 4Gb to bring them to 8GB.

The 412+ is a very limited machine, in all respects but storage. It does the storage thing quite well but if you run too many apps it's going to bog down. The 918+ is much better but neither is a replacement for a Xeon workhorse.

acseven commented 6 years ago

Thanks for your input.

If you are using <= 700MB in steady state give the container 1.3 GB.

I'm currently testing various configurations. 1000Mb, 1100Mb and 1200Mb aren't enough, CrashPlan eventually crashes. 1400Mb+ makes the system literally unresponsive to any requests. Right now it's running with 1300Mb for some time, let's see how that goes.

Do not let it swap.

That's what puzzles me - with the previous SPK I didn't have an issue with setting a moderately high number, such as 1500M/1650M. But with Docker, it starts to swap like crazy, making the system unresponsive. The datasets haven't changed significantly for quite some time, but it could be a borderline situation.

If you are using more than that, then you need more memory than 2GB. (...)

At this time it's evident for me that I have to increase the available RAM, because even if it does work right now, it will be for a short term.

Also, my recollection is that a number users reported performance issues when they pushed the DS412+ to 4GB. The hardware supports it, (...)

For reference of future users, that's not exactly true, as you can read over at a specific thread at the Synology Forums about it, where users have even compiled a datasheet of working setups and a lot of unreliable ones. The best reasoning I've read so far is this one, which makes a strong case about the RAM specifications for which 4Gb DDR should run without issues. I've ordered a 4Gb stick, let's see how that goes.

As for moving to another NAS, that will have to happen eventually. The issue is that I am planning to replace it with an 8-bay DS, and that isn't going to happen any time soon.

bigtfromaz commented 6 years ago

What is you memory usage when the container is not running?

acseven commented 6 years ago

It's about 700Mb, as you can see in the first screenshot of the first post in this thread.

bigtfromaz commented 6 years ago

That's a bit odd. I just dusted off a DS412+ that we haven't sold. Its memory is 2 X 1GB and it's steady state is 14%. Perhaps we can compare Service memory without the container running. Here is a screenshot from mine sorted by memory descending.
capture

It appears my baseline is 14% or 280 Meg. How does this compare to yours with no containers running?

In any event, I would really like to know how it runs with a single 4 GB stick.

acseven commented 6 years ago

2 X 1GB

It can't be - it must be single slot as there's just the one (i.e. 1 x 2Gb).

In any event, I would really like to know how it runs with a single 4 GB stick.

Sure thing, I'll update you when installed.

How does this compare to yours with no containers running?

Thanks for going for the trouble - as mentioned in the OP, I have other services running and that will justify the 700Mb alone and there's little to be done on that part. The bottom line is that I will have to have a lot more memory or eventually reduce the dataset. Reducing the dataset is actually something that can be done fairly easily without losing previous copies in the CrashPlan backup, for data that is in archival state - I just have to move/rename a folder and create an exception for it, should work for a while more.

But it looks good so far with the 1300M, you were both right:

At least it seems to be nearly finishing up a backup (last time this happened was a few weeks back) and all DSM services continue to be responding properly.

Update: finished!

bigtfromaz commented 6 years ago

It's an old machine so my recollection may be wrong. Here is a screen snap from the 418+ with 8GB.

capture

It's a weekend so it's not being used but there's 1.7 TB stored in CrashPlan. It looks under utilized but it's not. The Synology is using 5.4 Gb for file cache which is a big help when running SMB, NFS and iSCSI under load. The extra 2 GB you are adding should do the trick if it performs well.

acseven commented 6 years ago

The extra 2 GB you are adding should do the trick if it performs well.

Yeah, I believe it might as well, thanks for the input.

Well, right after it finished it started acting up again (did not crash but DSM was unresponsive for a good 10 minutes). Here's some screenshots for reference (the big straight lines in Resource Monitor correspond to the time it was unresponsive):

This is probably what was happening every morning a backup finishes - still can't understand why, though, in comparison to the SPK.

Anyway, it's good enough that it backed up the dataset, I'll have to see how it goes with the 4Gb.

bigtfromaz commented 6 years ago

You are right on the edge. Your CrashPlan memory requirements are barely being met. The machine is paging, which is always bad, and your file system buffers/cache show that you are likely working the drives harder than needed. The two DS412+ machines that I know work, are not experiencing this but their backup sizes are smaller and they are running fewer services than you. They are working but can't be far from the edge either. :)

acseven commented 6 years ago

Yup, pretty much. Let's see if the 4Gb stick works.

bigtfromaz commented 6 years ago

If that 4 GB module works well (the issue I read about appears to manifest itself as excessive CPU demand) please post the part number. I suspect there may be a number of people who will want to know. I know of two.

acseven commented 6 years ago

The new 4Gb stick is now inserted and apparently running very well so far.

scrn

scrn2

Again, this is running CrashPlan Pro for an overall backup of around 2.4Tb.

Here's the specs on the DDR I bought:

Samsung 4Gb / 2Rx8 / (256M x 8) x 16 Model M471B5273CH0-CH9 http://www.samsung.com/semiconductor/dram/module/M471B5273CH0-CH9/

bigtfromaz commented 6 years ago

Thanks for doing the research on this. I plan on ordering a card today.

jlesage / docker-crashplan-pro

Docker container excessive resource usage #82