haiwen / seafile

High performance file syncing and sharing, with also Markdown WYSIWYG editing, Wiki, file label and other knowledge management features.
http://seafile.com/
Other
12.43k stars 1.55k forks source link

High CPU Load problem #1045

Closed shubhank008 closed 8 years ago

shubhank008 commented 9 years ago

Hey guys I setuped seafile server recently and started to test it out in the wild with a few users. However I am facing a high CPU load problem, due to which the seahub is not able to serve the website content and the site itself.

Server EC2 (c4.xlarge) with Xeon E5-2666 v3 8 GB Ram 1Gbps network 1TB HDD

Setup Normal seafile-server setup with MySQL Seahub with Nginx (HTTP, not https) HTTP Sync SeafDAV (webDAV)

Scenario I am not sure about exact stats since my users were syncing files, but they were using bitkinex and uploading 12-15 files at once, and downloading around 6-8 files from seahub shared link. During this time, the CPU load went 100% The top process taking CPU load at the time was: top -c /usr/bin/python2.7 -m wsgidav.server.run_server runfcgi --log-file /home/xxxx/logs/seafdav.log --pid /home/xxxx/pids/seafdav.pid --port 8080

Some More Logs from server

ps -eo pcpu,pid,user,args | sort -k 1 -r | head -6

%CPU   PID USER     COMMAND
 0.5  2413 root     seaf-server -c /home/xxx/ccnet -d /home/xxx/seafile-data -l /home/xxx/logs/seafile.log -P /home/xxx/pids/seaf-server.pid
 0.3  2412 root     /usr/bin/python2.7 -m wsgidav.server.run_server runfcgi --log-file /home/xxx/logs/seafdav.log --pid /home/xxx/pids/seafdav.pid --port 8080
 0.1  2409 root     [flush-202:0]
 0.0     8 root     [migration/1]
 0.0  8896 root     head -6

pgrep -f "manage.py run_gunicorn"

no result

ps aux | grep seahub

root      8818  0.0  0.3  84480 24464 ?        S    01:17   0:00 python2.7 /home/xxxx/seafile-server-4.0.4/seahub/manage.py runfcgi host=127.0.0.1 port=8000 pidfile=/home/xxxx/seafile-server-4.0.4/runtime/seahub.pid outlog=/home/xxxx/seafile-server-4.0.4/runtime/access.log errlog=/home/xxxx/seafile-server-4.0.4/runtime/error.log
root      8820  0.0  0.5 121488 37544 ?        S    01:17   0:00 python2.7 /home/xxxx/seafile-server-4.0.4/seahub/manage.py runfcgi host=127.0.0.1 port=8000 pidfile=/home/xxxx/seafile-server-4.0.4/runtime/seahub.pid outlog=/home/xxxx/seafile-server-4.0.4/runtime/access.log errlog=/home/xxxx/seafile-server-4.0.4/runtime/error.log
root      8821  0.0  0.5 122180 38060 ?        S    01:17   0:00 python2.7 /home/xxxx/seafile-server-4.0.4/seahub/manage.py runfcgi host=127.0.0.1 port=8000 pidfile=/home/xxxx/seafile-server-4.0.4/runtime/seahub.pid outlog=/home/xxxx/seafile-server-4.0.4/runtime/access.log errlog=/home/xxxx/seafile-server-4.0.4/runtime/error.log
root      8822  0.0  0.4 113912 34332 ?        S    01:17   0:00 python2.7 /home/xxxx/seafile-server-4.0.4/seahub/manage.py runfcgi host=127.0.0.1 port=8000 pidfile=/home/xxxx/seafile-server-4.0.4/runtime/seahub.pid outlog=/home/xxxx/seafile-server-4.0.4/runtime/access.log errlog=/home/xxxx/seafile-server-4.0.4/runtime/error.log
root      8823  0.0  0.5 122032 38016 ?        S    01:17   0:00 python2.7 /home/xxxx/seafile-server-4.0.4/seahub/manage.py runfcgi host=127.0.0.1 port=8000 pidfile=/home/xxxx/seafile-server-4.0.4/runtime/seahub.pid outlog=/home/xxxx/seafile-server-4.0.4/runtime/access.log errlog=/home/xxxx/seafile-server-4.0.4/runtime/error.log
root      8824  0.0  0.5 121684 37568 ?        S    01:17   0:00 python2.7 /home/xxxx/seafile-server-4.0.4/seahub/manage.py runfcgi host=127.0.0.1 port=8000 pidfile=/home/xxxx/seafile-server-4.0.4/runtime/seahub.pid outlog=/home/xxxx/seafile-server-4.0.4/runtime/access.log errlog=/home/xxxx/seafile-server-4.0.4/runtime/error.log

To be honest, I havn't tested the setup without webdav in our group yet but it was still quite stable from personal tests, I think the webdav module is somehow causing some problems or clogging. Also, I am not sure if its normal for that many manage.py process to exist ?

killing commented 9 years ago

Are your users uploading many large files at once? Can you reproduce the load? The number of manage.py processes is normal. By default 5 fastcgi processes will be started for seahub.

shubhank008 commented 9 years ago

Excluding me, we were testing with 2 other users, one of them was uploading 100-200MB files max, other was uploading 500MB-1GB file I am not sure which user was actually uploading the file at that time (we use windows server to do that), is it possible to find out via any logs ? I can know the filesize/user by reading filename uploaded or any other possible details, if any, in logs

killing commented 9 years ago

Uploading many large files to the server via web or webdav can produce high CPU load. The server has to calculate hash and chunk the file into blocks. I suggest you upload large files with desktop clients. Right now you can limit the file upload size from the web GUI with 'max_upload_size' option in seafile.conf (see http://manual.seafile.com/config/seafile-conf.html). Unfortunately there is no option to limit file size for webdav.

shubhank008 commented 9 years ago
  1. Do using more powerfull server (not sure if I can use anything powerfull than this) makes any difference ?
  2. Do using load balancing or multi-server (tenancy/multi-instances) makes any difference ?
  3. How large of a file are we talking about that can cause this load ? Considering max uploads we could have done at once would be 10~ files.
  4. the file size limit in webGUI, you mean seahub right ? So I can limit it to say something like 200MB~ for seahub, while include a message to upload larger files use desktop clients ? The only problem still left then would be webDAV.
shoeper commented 9 years ago

You could limit webdav via webserver. On nginx by adding client_max_body_size Xm; - but I'm not sure if webdav clients can handle this right (but you could still give it try by setting the restricting for seafdav location)

An idea on how this could get fixed. Upload webdav and seahub files to a temporary directory and run a worker with low prority to commit them. Intercept work on large files each 30 seconds if there are small files in the queue to not block slow down whole systems by adding very large files in a too heavy way. Maybe for local networks there could also be a mode to send data to selected clients just to compute it's blocks. So in a company with 300 employees working with a local machine with gig lan these 300 could receive new webdav file on the fly, calculate library information for the server and send data back.

shubhank008 commented 9 years ago

+1 on controlling webDAV via nginx setting, havnt tried but should work Currently waiting for my users to come online so I can do a controlled retest using 10 files of 200MB and 1GB each using dekstop client, webdav and browser to better test it

Also, to be honest, I dont really like webdav, was using it as a FTP alternative, my users are more accustomed to FTP. Write now I am creating a FTP system, using php for a better solution. It will allow users to upload files to a temp directory of theirs via FTP, then a cron-based php script will upload those files to user's default repo using webAPI. Not a clean way but better than nothing.

Though I wonder, if same load issue will still occur as webAPI probably uses seahub's browser upload as well ?

shubhank008 commented 9 years ago

I am logging a few tests below and their result

Test 1 - Browser Used 5 files of 1GB each, uploaded using browser Average Upload speed was 4MBps even though both servers were in EU (seafile and our windows RDA, from which we are uploading) Server CPU load remained normal and stable, could even say minimum load Problem: Speed is too slow, both servers are on 1Gbps networks and in EU, its 4MBps per file and files uploading one by one. In WebDAV also, the average speed was 4MBps per upload but since you can upload 4-5 files at once, effective speed was 4x5 = 20MBps. Why is this so ?

Test 2 - WebDAV single file (previous browser upload still ongoing at file 2) Used 1 file of 1GB each, uploaded using bitkinex Average upload speed was 8.MBps (considering browser speed was still constant 4MBps~) Server CPU load spiked a lot from before, it was 0.02 while with browser but now is 0.67-0.98, came back to 0.5x after 70% file was uploaded. Even after upload finished, cpu load remained a bit high for a while, over 2 minutes from what I tracked.

root@ip-172-31-24-181:/home/xxxx/logs# ps -eo pcpu,pid,user,args | sort -k 1 -r | head -6
%CPU   PID USER     COMMAND
 0.5  2413 root     seaf-server -c /home/xxxx/ccnet -d /home/xxxx/seafile-data -l /home/xxxx/logs/seafile.log -P /home/xxxx/pids/seaf-server.pid
 0.3  2412 root     /usr/bin/python2.7 -m wsgidav.server.run_server runfcgi --log-file /home/xxxx/logs/seafdav.log --pid /home/xxxx/pids/seafdav.pid --port 8080
 0.1  8845 www-data nginx: worker process
 0.1  2409 root     [flush-202:0]
 0.1 11312 root     [flush-202:80]
top -c results
http://i.imgur.com/Xa5KE7S.png

Problem: Speed is fine but cpu load spikes, speed difference still doesnt makes sense and even after webDAV upload load remains a bit high

Test 3 - WebDAV multiple file (previous browser upload still ongoing at file 4) Used 4 file of 1GB each, uploaded using bitkinex Average upload speed was 28.MBps (considering browser speed was still constant 4MBps~) All 4 files were uploading at same time with 7MBps average speed Strangely, while the files were uploading, there was almost no server load, just minimal

top -c results
http://i.imgur.com/v9Xevgp.png

Once the files uploaded though (100% all 4), the load started to peak, it was 5.74 after I took below screenshot, nginx went down at this point and was showing 504 gateway time out errors. It remained so for a few minutes, around 4-5.

top -c results
http://i.imgur.com/84OAWqb.png
ps -eo pcpu,pid,user,args | sort -k 1 -r | head -6
%CPU   PID USER     COMMAND
 6.0 11528 root     python2.7 /home/xxxx/seafile-server-4.0.4/seahub/manage.py runfcgi host=127.0.0.1 port=8000 pidfile=/home/xxxx/seafile-server-4.0.4/runtime/seahub.pid outlog=/home/xxxx/seafile-server-4.0.4/runtime/access.log errlog=/home/xxxx/seafile-server-4.0.4/runtime/error.log
 1.5 11524 root     python2.7 /home/xxxx/seafile-server-4.0.4/seahub/manage.py runfcgi host=127.0.0.1 port=8000 pidfile=/home/xxxx/seafile-server-4.0.4/runtime/seahub.pid outlog=/home/xxxx/seafile-server-4.0.4/runtime/access.log errlog=/home/xxxx/seafile-server-4.0.4/runtime/error.log
 0.5  2413 root     seaf-server -c /home/xxxx/ccnet -d /home/xxxx/seafile-data -l /home/xxxx/logs/seafile.log -P /home/xxxx/pids/seaf-server.pid
 0.3  8845 www-data nginx: worker process
 0.3  2412 root     /usr/bin/python2.7 -m wsgidav.server.run_server runfcgi --log-file /home/xxxx/logs/seafdav.log --pid /home/xxxx/pids/seafdav.pid --port 8080

After stopping webdav and browser upload, killed all seahub processes, restarted nginx, restarted seahub-fastcgi, this was in the error file

seafile-server-4.0.4/runtime/error.log
error: [Errno 104] Connection reset by peer
error: [Errno 104] Connection reset by peer
error: [Errno 104] Connection reset by peer
error: [Errno 104] Connection reset by peer
error: [Errno 104] Connection reset by peer
error: [Errno 104] Connection reset by peer
error: [Errno 104] Connection reset by peer
error: [Errno 104] Connection reset by peer
error: [Errno 104] Connection reset by peer
error: [Errno 104] Connection reset by peer
error: [Errno 104] Connection reset by peer
error: [Errno 104] Connection reset by peer
error: [Errno 104] Connection reset by peer
error: [Errno 104] Connection reset by peer
error: [Errno 104] Connection reset by peer

Problem: Speed was great but cpu load obviously had problems, speed difference still doesnt makes sense and load is a worry. After a while, nginx crashed due to load reaching 30.xx

Test 4 - Seafile client single file Used 1 files of 1GB each, uploaded using desktop client, windows Client "indexed" file for a minutes~ Average Upload speed was 3MBps even though both servers were in EU (seafile and our windows RDA, from which we are uploading) Server CPU load remained normal and stable, could even say minimum load, almost like browser Problem: Speed is too slow, both servers are on 1Gbps networks and in EU, its 4MBps per file and files uploading one by one. This felt almost same as uploading from browser, slow speed on single file is a problem. Why is this so ?

PS: I added 4 more 1GB files to the localFolder of library created by desktop client, but seems it was still syncing 1 by 1 and thus, no difference in speed or load.

shubhank008 commented 9 years ago

UPDATE So, I just now used my second server, windows, same datacenter as the above server.

I was astonished, was getting 15-30MBps upload speed in browser and desktop client (I had other network resources used up as well). This is wierd.

How can this be possible (from followup tests) ? These tests are done at same time, with both servers in same region and datacenter and using same 600MB .mkv file. Both times we used web browser to upload

Server 1 Browser: Google Chrome Average Upload speed: 4MBps Browser: Internet Explorer Avg Speed: 9-11MBps

Server 2 Browser: Google Chrome Avg Speed: 15-30MBps

This is really wierd in server 1 case, both servers are same (hardware, software and such) yet its not getting same speed as Server2, and not just that, speed is varying a lot between both browsers, 4MBps and 11MBps

shoeper commented 9 years ago

Install munin or something similar and monitor your server. Monitor cpu load, hdd load, available entropy bandwith, ram usage and so (not sure if I've listed all relevant points. I could think of an hdd issues. Here are my experiences with seafile on 1gig lan: Seahub and WebDav are slow (thats not what seafile was made for imo). Upload via client is at 50MB's+ (Indexing is a bit slower than uploading). I'm using a low budget hp microserver gen8 server and throttled the cpu to save power. The point is: I'm using a ssd for system and seafile data and I think that makes the difference. My ~80Gig Seafile installation has about 100.000 blocks/files in seafile data. HDD is able to handle about 200 I/O operations per second, ssd can handle at least 20 times more. Server SSD's can handle 90.000 I/Os per s and more.

shoeper commented 9 years ago

Btw try using an ssd cache (read and write). And did you check if your gig lan is dedicated? Often ports are shared.

shubhank008 commented 9 years ago

I am using Amazon EC2 for my seafile instance, and magnetic storage for seafile-data while ssd for OS (obviously since 500GB SSD will be too costly) However, as I have stated in my debug tests above, the actual problem doesn't seem to be related to cpu/hdd/memory load, because even if they varied in all tests, the actual problem is that the upload/sync speed is way different on 2 servers with same config and specifications,which are hosted in same datacenter.

This cannot be a problem with resource usage or load, I am testing it with 1 user only and in different scenarios.

I am able to peak 125MBps download speed so pretty sure its dedicated 1Gbps lan, but again, that doesn't gives a clue why speed is varying from 4MBps to 50MBps between 2 servers.

Further more, the speed while using internet explorer was 2-3 times more than when using google chrome or client on same server

shoeper commented 9 years ago

To find out the reasons for this - as I've said before - you need to monitor your server. Is it a linux or windows server? Am 24.01.2015 14:27 schrieb "shubhank008" notifications@github.com:

I am using Amazon EC2 for my seafile instance, and magnetic storage for seafile-data while ssd for OS (obviously since 500GB SSD will be too costly) However, as I have stated in my debug tests above, the actual problem doesn't seem to be related to cpu/hdd/memory load, because even if they varied in all tests, the actual problem is that the upload/sync speed is way different on 2 servers with same config and specifications,which are hosted in same datacenter.

This cannot be a problem with resource usage or load, I am testing it with 1 user only and in different scenarios.

I am able to peak 125MBps download speed so pretty sure its dedicated 1Gbps lan, but again, that doesn't gives a clue why speed is varying from 4MBps to 50MBps between 2 servers.

Further more, the speed while using internet explorer was 2-3 times more than when using google chrome or client on same server

— Reply to this email directly or view it on GitHub https://github.com/haiwen/seafile/issues/1045#issuecomment-71316413.

shoeper commented 8 years ago

There was not further interaction.