Closed acseven closed 6 years ago
So if I understand correctly, you are saying that the Java memory limit is respected, but you suspect that the container is using additional memory?
Can you provide the output of the following command, which will give an indication of the memory consumed by processes ran by the container.
docker exec crashplan-pro ps -A
By the way, since everything in the container is running natively on the host, I'm not expecting the docker version to have much more overhead compared to the platter's package. Sure there is some overhead due to additional processes running in the container, but memory consumed by them should not represent a big proportion.
Hi, thanks for the reply.
Here's the requested output:
# docker exec crashplan-pro ps -A
PID USER TIME COMMAND
1 root 0:00 s6-svscan -s -t0 /var/run/s6/services
35 root 0:00 s6-supervise s6-fdholderd
530 root 0:00 s6-supervise statusmonitor
531 root 0:00 s6-supervise logmonitor
532 root 0:00 s6-supervise certsmonitor
533 root 0:00 s6-supervise CrashPlanEngine
534 root 0:00 s6-supervise x11vnc
535 root 0:00 s6-supervise app
536 root 0:00 s6-supervise openbox
537 root 0:00 s6-supervise xvfb
538 root 0:00 s6-supervise nginx
681 daemon 0:00 s6-fdholderd -1 -i rules
688 root 0:00 /usr/bin/Xvfb :0 -screen 0 1280x1024x24
705 root 0:00 [s6-notifyonchec]
721 root 0:00 /usr/bin/openbox
737 root 0:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf -g daemon off;
748 root 0:00 /usr/bin/x11vnc -display :0 -rfbport 5900 -rfbportv6 -1 -no6 -noipv6 -httpportv6 -1 -forever -desktop CrashPlan for Small Business -cursor arrow -shared -nopw -stunnel /config/certs/vnc-server.pem
758 root 0:00 sh ./certsmonitor
768 root 0:00 forstdin -d -- LINE fdmove 0 3 importas -u LINE LINE pipeline s6-ls -0 -- /etc/logmonitor/notifications.d pipeline s6-sort -0 -- forstdin -o 0 -0 -- i importas -u i i foreground /etc/logmonitor/notifications.d/${i}/filter ${LINE} importas -u ? ? if -t s6-test ${?} -eq 0 pipeline s6-ls -0 -- /etc/logmonitor/targets.d pipeline s6-sort -0 -- forstdin -o 0 -0 -- j importas -u j j if if -t s6-test -f /var/run/logmonitor/states/${j}.${i} -a -f /etc/logmonitor/targets.d/${j}/debouncing backtick -n -i DEBOUNCING s6-head -n1 /etc/logmonitor/targets.d/${j}/debouncing importas -u DEBOUNCING DEBOUNCING ifelse s6-test ${DEBOUNCING} -eq 0 s6-false backtick -n -i CURRENT_TIME date +%s importas -u CURRENT_TIME CURRENT_TIME backtick -n -i FILE_TIME date +%s -r /var/run/logmonitor/states/${j}.${i} importas -u FILE_TIME FILE_TIME backtick -n -i TIME_DIFF s6-expr ${CURRENT_TIME} - ${FILE_TIME} importas -u TIME_DIFF TIME_DIFF s6-test ! ${TIME_DIFF} -lt ${DEBOUNCING} backtick -n -D Unknown title TITLE ifelse s6-test -x /etc/logmonitor/notifications.d/${i}/title /etc/logmonitor/notifications.d/${i}/title ${LINE} s6-head -n1 /etc/logmonitor/notifications.d/${i}/title importas -u TITLE TITLE backtick -n -D Unknown description DESC ifelse s6-test -x /etc/logmonitor/notifications.d/${i}/desc /etc/logmonitor/notifications.d/${i}/desc ${LINE} s6-head -n1 /etc/logmonitor/notifications.d/${i}/desc importas -u DESC DESC backtick -n -D ERROR LEVEL ifelse s6-test -x /etc/logmonitor/notifications.d/${i}/level /etc/logmonitor/notifications.d/${i}/level ${LINE} s6-head -n1 /etc/logmonitor/notifications.d/${i}/level importas -u LEVEL LEVEL background /etc/logmonitor/targets.d/${j}/send ${TITLE} ${DESC} ${LEVEL} foreground s6-rmrf /var/run/logmonitor/states/${j}.${i} s6-touch /var/run/logmonitor/states/${j}.${i}
779 root 0:00 forstdin -d -- LINE fdmove 0 3 importas -u LINE LINE pipeline s6-ls -0 -- /etc/logmonitor/notifications.d pipeline s6-sort -0 -- forstdin -o 0 -0 -- i importas -u i i foreground /etc/logmonitor/notifications.d/${i}/filter ${LINE} importas -u ? ? if -t s6-test ${?} -eq 0 pipeline s6-ls -0 -- /etc/logmonitor/targets.d pipeline s6-sort -0 -- forstdin -o 0 -0 -- j importas -u j j if if -t s6-test -f /var/run/logmonitor/states/${j}.${i} -a -f /etc/logmonitor/targets.d/${j}/debouncing backtick -n -i DEBOUNCING s6-head -n1 /etc/logmonitor/targets.d/${j}/debouncing importas -u DEBOUNCING DEBOUNCING ifelse s6-test ${DEBOUNCING} -eq 0 s6-false backtick -n -i CURRENT_TIME date +%s importas -u CURRENT_TIME CURRENT_TIME backtick -n -i FILE_TIME date +%s -r /var/run/logmonitor/states/${j}.${i} importas -u FILE_TIME FILE_TIME backtick -n -i TIME_DIFF s6-expr ${CURRENT_TIME} - ${FILE_TIME} importas -u TIME_DIFF TIME_DIFF s6-test ! ${TIME_DIFF} -lt ${DEBOUNCING} backtick -n -D Unknown title TITLE ifelse s6-test -x /etc/logmonitor/notifications.d/${i}/title /etc/logmonitor/notifications.d/${i}/title ${LINE} s6-head -n1 /etc/logmonitor/notifications.d/${i}/title importas -u TITLE TITLE backtick -n -D Unknown description DESC ifelse s6-test -x /etc/logmonitor/notifications.d/${i}/desc /etc/logmonitor/notifications.d/${i}/desc ${LINE} s6-head -n1 /etc/logmonitor/notifications.d/${i}/desc importas -u DESC DESC backtick -n -D ERROR LEVEL ifelse s6-test -x /etc/logmonitor/notifications.d/${i}/level /etc/logmonitor/notifications.d/${i}/level ${LINE} s6-head -n1 /etc/logmonitor/notifications.d/${i}/level importas -u LEVEL LEVEL background /etc/logmonitor/targets.d/${j}/send ${TITLE} ${DESC} ${LEVEL} foreground s6-rmrf /var/run/logmonitor/states/${j}.${i} s6-touch /var/run/logmonitor/states/${j}.${i}
793 root 1:44 /usr/local/crashplan/jre/bin/java -Dfile.encoding=UTF-8 -Dapp=CrashPlanService -DappBaseName=CrashPlan -Xms20m -Xmx1550M -Dsun.net.inetaddr.ttl=300 -Dnetworkaddress.cache.ttl=300 -Dsun.net.inetaddr.negative.ttl=0 -Dnetworkaddress.cache.negative.ttl=0 -Dc42.native.md5.enabled=false -classpath /usr/local/crashplan/lib/com.backup42.desktop.jar:/usr/local/crashplan/lang:/usr/local/crashplan com.backup42.service.CPService
808 root 0:00 /bin/s6-notifyoncheck -n 0 s6-setuidgid 0:0 /usr/local/crashplan/bin/startCrashPlanEngine.sh
810 root 0:00 s6-ftrigrd
840 root 0:00 tail -n0 -F /config/log/service.log.0
841 root 0:00 /usr/bin/stunnel -fd 3
842 nginx 0:00 nginx: worker process
843 root 0:00 {tailstatusfile} /bin/sh /usr/bin/tailstatusfile /config/log/app.log
844 nginx 0:00 nginx: worker process
6809 root 0:00 sh /startapp.sh
6881 root 0:08 /usr/local/crashplan/electron/crashplan
7934 root 0:00 sleep 5
7948 root 0:00 sleep 5
7960 root 0:00 ps -A
10300 root 0:00 /usr/local/crashplan/electron/crashplan --type=zygote --no-sandbox
13232 root 0:16 /usr/local/crashplan/electron/crashplan --type=renderer --no-sandbox --primordial-pipe-token=B1299A2EED114510016D550A89D810A6 --lang=en-US --app-path=/usr/local/crashplan/electron/resources/app.asar --node-integration=true --hidden-page --enable-pinch --num-raster-threads=2 --enable-main-frame-before-activation --content-image-texture-target=0,0,3553;0,1,3553;0,2,3553;0,3,3553;0,4,3553;0,5,3553;0,6,3553;0,7,3553;0,8,3553;0,9,3553;0,10,3553;0,11,3553;0,12,3553;0,13,3553;0,14,3553;0,15,3553;1,0,3553;1,1,3553;1,2,3553;1,3,3553;1,4,3553;1,5,3553;1,6,3553;1,7,3553;1,8,3553;1,9,3553;1,10,3553;1,11,3553;1,12,3553;1,13,3553;1,14,3553;1,15,3553;2,0,3553;2,1,3553;2,2,3553;2,3,3553;2,4,3553;2,5,3553;2,6,3553;2,7,3553;2,8,3553;2,9,3553;2,10,3553;2,11,3553;2,12,3553;2,13,3553;2,14,3553;2,15,3553;3,0,3553;3,1,3553;3,2,3553;3,3,3553;3,4,3553;3,
So if I understand correctly, you are saying that the Java memory limit is respected, but you suspect that the container is using additional memory?
I don't think it's using additional RAM, at least that's not what I'm seeing the resource monitor, the issue seems to be swap memory related, given that all the unresponsiveness links to high disk usage. That's why I tried pushing the java mx limit as high as I could, to see if it would improve and it didn't (although it might cause DSM to become sluggish on its own).
Just for reference, I started the container to grab the output and then took a few screenshots of the Resource Monitor while still possible.
And right now it's on its own. No services are responding, including access via SSH and DSM.
Swap usage could explain the issues you are seeing. It seems that there is a way in Synology to see swap usage: https://www.synology.com/en-us/knowledgebase/DSM/help/DSM/ResourceMonitor/rsrcmonitor_performance. You can confirm with that if swap is highly used.
In the task manager, you can sort processes by memory usage. I would use that to see which ones are consuming the most. But it seems that 2GB is very small for the amount of data you have to backup. CrashPlan alone would need the whole 2GB (https://support.code42.com/CrashPlan/6/Troubleshooting/Adjust_Code42_app_settings_for_memory_usage_with_large_backups).
If you set the java memory limit back to 1GB, do you still have the responsiveness issue?
According to your first screenshot, when the container is stopped, about 30% of the memory is consumed. So about 1.4GB is available to CrashPlan. Having a java memory limit higher than that would cause the system to start using swap.
I'm looking into increasing the RAM to 4Gb, which - if possible - should make the issue disappear.
In the meantime, I'll take a look at your pointers on swap memory, thanks for that. I'm fairly aware of CrashPlan's limitations with RAM (as I mentioned in the OP), but I essentially wrote because the behaviour of the container is different from patters's SPK: the latter would simply crash and the NAS would keep on working, whereas the docker implementation just keeps on going, making everything stop responding.
From my experience CrashPlan doesn't actually require a lot of RAM to run while uploading - it hogs down when/after it scans a data set, being it the scheduled scan or a server sync maintenance task. That's when it spikes. Considering the latter is not very frequent, this means CrashPlan would run, for the most part, until a scheduled scan begins. At that point the system would either have available resources or the service would crash.
That's the main thing; I would prefer that CrashPlan would crash due to insufficient resources at that point, keeping the remaining system functionality fully available. The current behaviour is just unusable, as I literally do not have any access to the NAS while it tries to figure out what to do with CrashPlan.
Keeping the Java limit low (1GB for example) should make CrashPlan engine to crash, like with patters's SPK. Is it the case? Also, with patters's SPK, did you have the UI running all the time? Maybe this is something that can make a difference.
Keeping the Java limit low (1GB for example) should make CrashPlan engine to crash, like with patters's SPK. Is it the case?
I'll try, as soon as I can. The thing with 1Gb is that I am pretty sure that with that amount of available RAM it won't be able to do much, including uploads.
Also, with patters's SPK, did you have the UI running all the time? Maybe this is something that can make a difference.`
No, far from it, the UI was open once every few weeks.
I know of at least 3 DS412+ machines with 2GB that ran CrashPlan just fine, inside and outside a Docker container. How much memory is consumed in steady state when your container is not running?
If you are using <= 700MB in steady state give the container 1.3 GB. Do not let it swap. If you are using more than that, then you need more memory than 2GB.
Also, my recollection is that a number users reported performance issues when they pushed the DS412+ to 4GB. The hardware supports it, but my guess is there are some kernel parameters that DSM is/was setting that required tuning. You may figure that out but you'll be adjusting them with every upgrade.
When we reached our cap, rather than dealing with kernel tuning, we sold the DS412+ machines on Ebay and replaced them with DS918+ and added 4Gb to bring them to 8GB.
The 412+ is a very limited machine, in all respects but storage. It does the storage thing quite well but if you run too many apps it's going to bog down. The 918+ is much better but neither is a replacement for a Xeon workhorse.
Thanks for your input.
If you are using <= 700MB in steady state give the container 1.3 GB.
I'm currently testing various configurations. 1000Mb, 1100Mb and 1200Mb aren't enough, CrashPlan eventually crashes. 1400Mb+ makes the system literally unresponsive to any requests. Right now it's running with 1300Mb for some time, let's see how that goes.
Do not let it swap.
That's what puzzles me - with the previous SPK I didn't have an issue with setting a moderately high number, such as 1500M/1650M. But with Docker, it starts to swap like crazy, making the system unresponsive. The datasets haven't changed significantly for quite some time, but it could be a borderline situation.
If you are using more than that, then you need more memory than 2GB. (...)
At this time it's evident for me that I have to increase the available RAM, because even if it does work right now, it will be for a short term.
Also, my recollection is that a number users reported performance issues when they pushed the DS412+ to 4GB. The hardware supports it, (...)
For reference of future users, that's not exactly true, as you can read over at a specific thread at the Synology Forums about it, where users have even compiled a datasheet of working setups and a lot of unreliable ones. The best reasoning I've read so far is this one, which makes a strong case about the RAM specifications for which 4Gb DDR should run without issues. I've ordered a 4Gb stick, let's see how that goes.
As for moving to another NAS, that will have to happen eventually. The issue is that I am planning to replace it with an 8-bay DS, and that isn't going to happen any time soon.
What is you memory usage when the container is not running?
It's about 700Mb, as you can see in the first screenshot of the first post in this thread.
That's a bit odd. I just dusted off a DS412+ that we haven't sold. Its memory is 2 X 1GB and it's steady state is 14%. Perhaps we can compare Service memory without the container running. Here is a screenshot from mine sorted by memory descending.
It appears my baseline is 14% or 280 Meg. How does this compare to yours with no containers running?
In any event, I would really like to know how it runs with a single 4 GB stick.
2 X 1GB
It can't be - it must be single slot as there's just the one (i.e. 1 x 2Gb).
In any event, I would really like to know how it runs with a single 4 GB stick.
Sure thing, I'll update you when installed.
How does this compare to yours with no containers running?
Thanks for going for the trouble - as mentioned in the OP, I have other services running and that will justify the 700Mb alone and there's little to be done on that part. The bottom line is that I will have to have a lot more memory or eventually reduce the dataset. Reducing the dataset is actually something that can be done fairly easily without losing previous copies in the CrashPlan backup, for data that is in archival state - I just have to move/rename a folder and create an exception for it, should work for a while more.
But it looks good so far with the 1300M, you were both right:
At least it seems to be nearly finishing up a backup (last time this happened was a few weeks back) and all DSM services continue to be responding properly.
Update: finished!
It's an old machine so my recollection may be wrong. Here is a screen snap from the 418+ with 8GB.
It's a weekend so it's not being used but there's 1.7 TB stored in CrashPlan. It looks under utilized but it's not. The Synology is using 5.4 Gb for file cache which is a big help when running SMB, NFS and iSCSI under load. The extra 2 GB you are adding should do the trick if it performs well.
The extra 2 GB you are adding should do the trick if it performs well.
Yeah, I believe it might as well, thanks for the input.
Well, right after it finished it started acting up again (did not crash but DSM was unresponsive for a good 10 minutes). Here's some screenshots for reference (the big straight lines in Resource Monitor correspond to the time it was unresponsive):
This is probably what was happening every morning a backup finishes - still can't understand why, though, in comparison to the SPK.
Anyway, it's good enough that it backed up the dataset, I'll have to see how it goes with the 4Gb.
You are right on the edge. Your CrashPlan memory requirements are barely being met. The machine is paging, which is always bad, and your file system buffers/cache show that you are likely working the drives harder than needed. The two DS412+ machines that I know work, are not experiencing this but their backup sizes are smaller and they are running fewer services than you. They are working but can't be far from the edge either. :)
Yup, pretty much. Let's see if the 4Gb stick works.
If that 4 GB module works well (the issue I read about appears to manifest itself as excessive CPU demand) please post the part number. I suspect there may be a number of people who will want to know. I know of two.
The new 4Gb stick is now inserted and apparently running very well so far.
Again, this is running CrashPlan Pro for an overall backup of around 2.4Tb.
Here's the specs on the DDR I bought:
Samsung 4Gb / 2Rx8 / (256M x 8) x 16 Model M471B5273CH0-CH9 http://www.samsung.com/semiconductor/dram/module/M471B5273CH0-CH9/
Thanks for doing the research on this. I plan on ordering a card today.
Hi,
I'm using your package in a DS412+ with 2Gb of RAM. Before then I had my round of issues using patters' SPK package and running this on Docker has been, at least from from the CrashPlan application point of view, pretty much better overall.
However, ever since running the Docker container, I have been having a lot of difficulty in reaching my Synology services in general. At the worst moments DSM would be simply unreachable, returning network time outs. Other services, which are normally instantaneous or pretty responsive (Domoticz, Plex), were sluggish, if actually responsive. THe solutions for this were two: wait (sometimes a few hours) or shutdown the box (gracefully would take a good half an hour to do so).
Using DSM's Resource Monitor, the behaviour is very self-evident:
The screenshot above shows the moment I turned off the Docker container (took me a good few minutes to actually being able to see Resource Monitor). After that DSM returned to being fully responsive.
Regarding CrashPlan Pro: I'm using it with two file sets for a total of about 2.5Tb and maybe 400k files, both scheduled to run only from 1AM to 8AM. The issue is most evident in the morning, after 8AM, although it persists throughout the day (the screenshot is at this time, when CrashPlan is not backing up). Knowing very well that these kind of issues are most of the times related to the java heap maximum size, I have been trying to fiddle with for the last few weeks. For reference, using patters' SPK with maximum heap of ~1500Mb would suffice both for backup and re-indexing.
Using this Docker implementation I have tried setting the Java environ variable to 1200, 1600 and 1800 Mb and none changed my original issue of system unresponsiveness. I would of course get warnings within CrashPlan itself of it having reached peak memory (lowest numbers) at some point in time. My take so far is that even with the java limit in place, the container is abusing resource usage in a very excessive way, such a way that I had never experienced with the SPK. The SPK would crash - but right now I'm thinking that would probably be preferable than having all the drives working this hard and not having access to the box, something critical during a work day.
From what I've read online, the kernel used by [at least my] Synology is not able to limit resource usage and so there is not point in setting limits in the actual Docker app, because those will not work (they don't, I have tested). Is this something like Docker using swap memory to make up for not having enough RAM available?
Is there anything I might be missing in my config?