owncloud / client

🖥️ Desktop Syncing Client for ownCloud
GNU General Public License v2.0
1.4k stars 665 forks source link

Upload of many Files really slow #331

Closed myzinsky closed 10 years ago

myzinsky commented 11 years ago

Expected behaviour

Upload of a lot of MP3 Files (4GB) should be fast.

Actual behaviour

Upload Only 6 Files (ca 40mb) in 4 hours!

Server configuration

Operating system: Debian Stable

Web server: Apache2

Database: Mysql

PHP version: PHP Version 5.3.3-7+squeeze14

ownCloud version: 4.5.6

Client configuration

Client version: 1.2

Operating system: Debian Unstable

OS language: German

Installation path of client: /usr/bin/owncloud

Logs

ownclouds://xxx/Blue Moon 02-12 14:18:20:533 oc_module: owncloud_stat ownclouds://xxx/Blue Moon called 02-12 14:18:20:678 oc_module: Simple propfind result code 207. 02-12 14:18:20:678 oc_module: Server Date from HTTP header value: Tue, 12 Feb 2013 13:18:20 GMT 02-12 14:18:20:679 oc_module: Ok: Time delta remained (almost) the same: 0. 02-12 14:18:20:679 oc_module: => Errno after fetch resource list for ownclouds://xxx/Blue Moon: 0 02-12 14:18:20:679 oc_module: Working on file Blue Moon 02-12 14:18:20:679 oc_module: :> Subtracting 0 from modtime 1360675100 02-12 14:18:20:679 oc_module: STAT result from propfind: Blue Moon, mtime: 1360675100 02-12 14:18:20:679 oc_module: Directory of file to open exists. 02-12 14:18:20:679 oc_module: PUT request on //xxx/Blue%20Moon/Robben%20Ford%20-%20My%20Everything.mp3! 02-12 14:18:20:679 oc_module: Sendfile handling request type PUT. 02-12 14:18:20:679 oc_module: Put file size: 6858752, variable sizeof: 8

This is an example, it waits a really loooong time :( (Paths shortened with xxx)

Web server error log

Deprecated: Directive 'register_long_arrays' is deprecated in PHP 5.3 and greater in Unknown on line 0

There is a lot of this stuff, but i'm not shure if this is realted to owncloud

ownCloud log (data/owncloud.log)

{"app":"media","message":"error reading title tag in '\/var\/www\/owncloud\/xxx.mp3'","level":2,"time":1360575380}### Expected behaviour Upload of a lot of MP3 Files (4GB) should be fast.

bflorat commented 11 years ago

I divided the update/delete/insert time by two (or more) with mysql / InnoDB using this mysql option :

innodb_flush_log_at_trx_commit=0 (default is 1)

Beware that it means that the actual IO flush is only done every one second and no more at each commit (see [1]), you may lose data so don't use this for critical systems.

Some tests using a webdav mount (the touch command through owncloud webdav connector triggers an INSERT for new files), note that you have to change the files names to make sure owncloud actually performs INSERT queries :

[1] http://dev.mysql.com/doc/refman/5.0/en/innodb-tuning.html

phonovision commented 11 years ago

I guess the best solution would be not to access the db at all as long as the upload is not complete.

Instead the uploads could be written as a temporal file (as Firefox does it for downloads) that are renamed on completion. The metadata of incomplete uploads could be preserved in the meanwhile using a sidecar file. A nice side-effect would be that the sidecar file could be used as lock file to avoid concurrent uploads.

bence8810 commented 11 years ago

Hello all. Have anyone tested the new version 5.0.9 yet? It says in the release notes that a speed issue has been addressed. http://owncloud.org/changelog/

Thanks, Ben

VirTechSystems commented 11 years ago

I have been running 5.0.10 for a few days and am still seeing slower than expected uploads. So, I'm not sure that that entry takes care of the performance issues.

hamiller commented 11 years ago

Im currently on 5.0.10, too, and use 1.4.0beta1 as client: It's MUCH faster now. Not perfect, but much faster!

I've also changed my apache webserver to use mod_fcgid, which may also have a huge impact at the better speed.

FYI: uploaded ~3GB of data (in ~1300 files) -> took about 2hours!

realtek2017 commented 11 years ago

Also, make sure External Storage is switched off.

Mine was extremely slow and noticed it was going through my whole dropbox. Not sure if its designed to 'sync' external storage or not but it was still stepping through them all and was very slow.

Disabled the plugin and its a LOT faster.

clauderobi commented 11 years ago

I am currently on 5.0.10 and 1.4.0.beta2. ..... and it is still way too slow. The sync processes ONE file at every 10 second or so.

I did some wireshark and it is definitely the server side. I see large gaps (6-8 seconds) between the server TCP ACK and the server reply.

The client always uses the same TCP connection so I guess the comment above about the many connections is resolved. But the main slowlyness issue still remains.

The server is mostly idle but with spikes at 20-30% on 1 core (it is a dual core). Meory is only 700MB out of 4GB. Disk (only local SATA disk) are somewhat on the slow side but only few 10s of KBps are being written.

My setup: Server and client ubuntu 12.04 apache2 2.2.22 php5.3.10 sqlite 2.8.17 or 3.7.9 (not sure if oc uses sqlite or sqlite3) php-apc 3.1.7

tytgatlieven commented 11 years ago

All

I've concluded (it used to be earlier in this thread but seems to be gone...) that the issue is indeed due to the server.

The problem is more specifically that the php webdav implementation apparently does a huge amount of mysql selects per file upload (around 600 i believe), and a large number of update statements per file upload (around 80 something I believe), with each update taking around 40 times longer than a select.

Hence this issue needs to be resolved before the owncloud webdav implementation can be of practical use. I have created a small dirty hack which removed the duplicate updates, and makes file uploads 2 times faster. However, this is still too slow in my opinion.

So the issue to fix is the following: Issue #3118

I'm drowning in work at the moment, so if anybody would be willing to investigate this deeper, please do so. I can do tests, and have some xdebug and mysql logs which are most likely still accurate.

Request them if needed, for I seem to fail to find the correct way to upload files in my gist...

Polymathronic commented 11 years ago

I've just run a test with ~1000 small files (~14 MB extracted) on both Windows and Mac OS (both clients and server on the same network). The Mac OS client uploaded everything in less than two minutes, whereas its Windows counterpart took 15 minutes to finish.

Clients v1.3.0 Server v5.0.10 (Ubuntu 12.04)

clauderobi commented 11 years ago

I guess all this means that there are 2 issues, one that affect the server side and 1 that resides on the client side.

If I do the math, with MAC you are getting 8 files per second and 1 per second with Windows.

In my case, with a Linux client, I get 1/10th of file per second. As I said, the wireshark trace shows long delay on the server side.

Anyone has a suggestion of where to look on my server setup? I am not using mysql, jusy sqllite but I can not believe this is the culprit.

dragnovich commented 11 years ago

Im drown in work for now, and cant help a lot...

but as I seen when did a quick check of the sources, the problem resides in the server sync client and desktop client (All the clients: Windows, Linux and MAC are coded with the same source code), the main problem is that those two are WAY TO CHATTY and do to many calls to the database, to keep the "upload status" of each file, so when they are so many files, they do the same "chat" for each file, and it becomes exponential. And if your server has a limit on the calls/connections per second that you are alowed, then you can imagine it will go even more slower.

The code needs a huge recode and maybe will be better to drop totally the "upload status" calls, and just wait to the file to be completed.

2013/8/21 clauderobi notifications@github.com

I guess all this means that there are 2 issues, one that affect the server side and 1 that resides on the client side.

If I do the math, with MAC you are getting 8 files per second and 1 per second with Windows.

In my case, with a Linux client, I get 1/10th of file per second. As I said, the wireshark trace shows long delay on the server side.

Anyone has a suggestion of where to look on my server setup? I am not using mysql, jusy sqllite but I can not believe this is the culprit.

— Reply to this email directly or view it on GitHubhttps://github.com/owncloud/mirall/issues/331#issuecomment-23031004 .

thibaultcha commented 11 years ago

The Mac OS client uploaded everything in less than two minutes, whereas its Windows counterpart took 15 minutes to finish.

Well, I am running the desktop sync app on my Mac (10.8.5 mid-2010 MacBook Pro) and I can tell you it is very, very slow. 115MB out of 1.5GB in more than 2 hours so far. And the slowness of the upload seems to be unrelated to the speed of my connection.

What I can tell you about the logs is that it's full of those:

22/09/13 19:29:01,038 owncloud[63726]: _GetISImageRefFromIconRefInternal: could not retain image ref 0x2ce6001d (err=-2580)
22/09/13 19:29:01,038 owncloud[63726]: _GetISImageRefFromIconRefInternal: could not retain image ref 0x2ce6001d (err=-2580)
22/09/13 19:29:01,038 owncloud[63726]: PlotIconRef 0x1e InContext 0x109d41290 in {0.000000,0.000000,128.000000,128.000000}, inAlign:0, inTransform:0, inFlags:0 failed. Invalid image ref (err=-2582)

But nothing else.

etiess commented 11 years ago

Maybe the last post of dragotin (http://dragotin.wordpress.com/2013/09/27/dav-torture/) is related to this issue?

Thanks for working on it!

dragotin commented 11 years ago

Yes, sure, we're working on performance improvements. btw, @etiess could you do me a favour? You mentioned that carrying a synced directory within the sync journal on a USB stick to a new computer does not work, I have an idea why, could you create a bug report about that? So I wont forget and probably can fix it. Important thing.

etiess commented 11 years ago

@dragotin : Bug report done (my first one, I'm excited but please excuse me if I'm missing something! ;-) ) : https://github.com/owncloud/core/issues/5231

smily03 commented 11 years ago

As a random question, once I get everything uploaded to the server, will it pull down on my other computers fast? I'm about 10 hours in at this point, 2GB out of 35GB worth of data, and want to make sure it's not going to be this bad pulling the data down onto other computers I want to sync with... (OC server 5.0.12, linux client 1.4.0, both oS 12.2 x64 boxes.)

zliden commented 11 years ago

Large files are copied at a rate 2-5 MB/s, but after each file pause for 4 seconds. In the case of small files, the speed tends to zero. (same PC, Owncloud in LXC, copied from/to same disk (or not same - no difference)) CPU loaded on 20-80% 80% on big file - hight speed

moscicki commented 11 years ago

Hi guys,

I think this issue was (at least partially) understood as a server-side bug already 3 weeks ago:

https://github.com/owncloud/core/issues/5084

It has "Owncloud 6" sticker on it.

kuba

moscicki commented 11 years ago

And there is a related issue -> if you try a really big file you will the same effect:

https://github.com/owncloud/core/issues/5089

kuba

chrismyers81 commented 11 years ago

Something that made a HUGE difference to me (eg. uploading 2GB in 10 hours -> uploading 30GB in ~an hour) was switching from the default SQLite database to using mySQL on the server. At the time was running server 5.0.12 and client 1.4.2, now running server 6.0 b1 and still client 1.4.2.

jancborchardt commented 10 years ago

@dragotin @danimo is this also improved with 1.5? I heard it being a lot faster. :)

docdawning commented 10 years ago

I'm running a fresh install of OC 6 on Ubuntu 12.04 and my internal network is moving data between the OSX client and my server very very slowly. I have about 1.2GB of data to load and so far it's been going about 10mins and is maybe 10% done. My network is very very fast, so it's not the bottleneck. Just chiming in.

racic commented 10 years ago

After upgrading to oC 6.0 / client 1.5.0 and from sqlite to mysql backend: upload of really small files (up to few KB) is still pain but at least for files as big as mp3s or pictures (1-5 MBs) the time between subsequent transfers is much lower than before so that the oC is now usable for me.

mauromi commented 10 years ago

Update server to 6.0.0a and client to 1.5.0 and owncloud is still unusable with many small files.

dietmaroc commented 10 years ago

Expected behaviour

if a folder is copied into the local owncloud synch folder, the synch should run trough in acceptable time, by uploading contiunousely

Actual behaviour

Clearly obsearvable breaks between the synching oft the files occure, which increase the upload-time enormous.

Steps to reproduce

  1. create a folder with files of optional size
  2. copy this folder into your local ownCloud synch-folder
  3. open the synchclient and watch the display

Server configuration

Operating system: Ubuntu 13.1 Web server: Apache 2 Database: MySQL 5.5.34 PHP version: php 5.3.3 ownCloud version: 6.0.1.r1 Storage backend: none

Client configuration

Client version: 1.5.0.1913 Operating system: windows7 64bit (VM)

dirkgroenen commented 10 years ago

Ofcourse, like others, I'm experiencing the same problem. But after reading this issue I don't think mentioning again really helps fixing this problem.

Is there some progress in rewriting this way of syncing (especially uploading)? I know my PHP and MySQL, but I think not in a way to give any support in this. Besides from that I little bit of brainstorming can never hurt, right? So why is every file uploaded and handled right away? Isn't it a solution to first upload all the files and 'sort' them in a background cron job that runs after uploading has finished?

neoscaler commented 10 years ago

Extrem slow transfers here with OC 6.0.2.

I am trying to sync ~2000 small files (total 1MB), it syncs now for 12h and it is not finished yet. Upload is via gigabit LAN (could handle ~80MB/s to the server).

This is really a show stopper for me. A (file) cloud solution that can't sync small files.

dragotin commented 10 years ago

Closing this as we achieved speed improvements on the client with 1.6.0, now we need to catch up on server side. Please track in the mentioned server bugs.

modernmediagrp commented 9 years ago

@sargelavoie - hey can you message me on how you got leny running on your BA NAS?

sargelavoie commented 9 years ago

@modernmediagrp - I have followed these instructions. However, I do not use owncloud anymore. I use syncthing running on a raspberry pi which is connected to my nas through nfs.

melroy89 commented 9 years ago

This issue ist still not solved!!

knowlecules commented 9 years ago

+1, I'm experiencing the same slowness.

melroy89 commented 9 years ago

So what's the next step :S?

Croydon commented 8 years ago

Still a big issue if one is uploading many small files. I got a remaining time displayed in MONTHS.

scolebrook commented 8 years ago

It's helpful to provide data to quantify what slow means in your situation and what your configuration is. For me, a folder of 1000 small files totaling 19MB takes a little under 6 minutes to sync. I've run this test with ownCloud client 1.8.4 and 2.0.2 against server 8.1.3. Server side is two load balanced apache 2.2 boxes running a memcache instance on each. PHP sessions are in memcache using memcached with data written to the memcache instance on both servers. The database is a 3 node Percona Cluster behind a load balancer. Storage is a NAS that is connected to the web servers by smb mounted in the file system. The network links between all the VMs we use is 10Gbps, internet pipe is 100Mbps and my pipe was 20Mbps.

During these tests I saw a db query rate of about 3k/sec. Before and after was about 2k/sec indicating that this sync operation resulted in 1k queries/s. I didn't see abnormal cpu load on the db nodes or web servers. There was also no unusual iowait levels on the web servers and network traffic was no where near capacity. I'm not sure if the speed of this transfer is a result of latency at one point or another. Given the lack of load anywhere I suspect so. Perhaps increased parallelism by transferring more files at once would make better use of the server and network capacity available.

Since server capabilities vary, and may vary from one hour to the next, parallelism would need to be dynamic. And how should the client determine when to increase the number of simultaneous uploads or decrease them? Not as easy as it sounds.

ghost commented 8 years ago

I have that and I don't know how to fix it :( I left it overnight to find that it probably went idle after a while and only completed a small part of the transfer.

moscicki commented 8 years ago

This crippled performance is mainly the server issue. Meaning software running on the server and resulting in insane sql activity (and insane php overheads). I guess this is major milestone for owncloud 9 server release but I would suggest you check with owncloud server developers to be sure. If you happen to use sqlite on the server I would suggest you do not and switch to mysql for instance.

On Sun, Nov 15, 2015 at 11:00 AM, Adam Zahran notifications@github.com wrote:

I have that and I don't know how to fix it :(

— Reply to this email directly or view it on GitHub https://github.com/owncloud/client/issues/331#issuecomment-156796953.


Best regards, Kuba

melroy89 commented 8 years ago

@moscicki Can you proof this behavior?

moscicki commented 8 years ago

Yes: with a C++ webdav server things are much faster.

On Mon, Nov 16, 2015 at 2:50 PM, Melroy van den Berg < notifications@github.com> wrote:

@moscicki https://github.com/moscicki Can you proof this behavior?

— Reply to this email directly or view it on GitHub https://github.com/owncloud/client/issues/331#issuecomment-157031677.


Best regards, Kuba

melroy89 commented 8 years ago

WebDav is a HTTP protocol and has nothing to do with MySQL.

scolebrook commented 8 years ago

@danger89 ownCloud isn't just webDAV. Try running it without a database of some kind. SQLite is a low performance database backend. If you want speed, use MySQL. If you have more than one person or sync client using ownCloud at the same time. SQLite is really only good as a data store for a single application which is why the sync client uses it for the journal.

You can see the point @moscicki makes from the stats I supplied further up this thread. While I was uploading a folder of about 1000 small files there was a query rate increase of approximately 1000 queries a second. This lasted for a 6 minute period. That's an average of about 360 queries to upload each file. Now there's folder structure in there to consider too. It had to discover that directories didn't exist, make them and so on. But 300+ queries is a massive amount for a single file. With the network round trip between the web and database servers running at about 0.8ms (there's a load balancer in there too) a very large percentage of the 6 minutes for my test is clearly network latency due to the massive amount of communication to the db.

This has been an area of continuous improvement with each major version. But it's clear that there still room for more.

aronovgj commented 8 years ago

It is still innodb_flush_log_at_trx_commit on the server side. set it to 0 or 2.

aronovgj commented 8 years ago

Also nobody could explain to me what this means: "Beware that it means that the actual IO flush is only done every one second and no more at each commit (see [1]), you may lose data so don't use this for critical systems"

What can be lost during this second and can this corrupt my files if mysql has a properly working recovery?

Want to add that I set the variable to 2 a month ago and I did not lose any data until now.

scolebrook commented 8 years ago

@aronovgj By setting innodb_flush_log_at_trx_commit to 0 you are turning off MySQL's properly working recovery mechanism and relying solely on the capabilities of your storage backend (battery protected write cache, etc). Better have some good power management in place.

Setting it to 2 results in the log being written for every write query but the other write operations happening once a second at most. This is not ACID compliant unless you're in a galera cluster where the cluster as a whole is still ACID compliant.

There is only a marginal performance improvement between 2 and 1. 1 is where everything is flushed to disk for every write query and is the only option that is ACID compliant for a stand alone MySQL server.

But 0 doesn't change the results of the test I described above. While there is less disk io on our database nodes, it takes the same time to sync the same folder. This reinforces my opinion that in my case, the time is a function of network latency. The latency is very, very small in our environment, about as small as you can possibly make it, but there are literally hundreds of queries for each file and it really adds up.

I have no idea what the content of all these queries is but I think it's a pretty safe bet that many of them are selects for exactly the same information over and over again by different parts of the code. It would be good if that information could be cached to avoid hitting the db over and over.

ghost commented 8 years ago

now we need to catch up on server side. Please track in the mentioned server bugs.

from https://github.com/owncloud/client/issues/331#issuecomment-41292972

melroy89 commented 8 years ago

If it's a server side issue, indeed we need to take a look at the 'core' issues instead. Give this issue more credits for example: https://github.com/owncloud/core/issues/20967

Croydon commented 8 years ago

14.329 files, 557 directories, 91,8 MB and it runned for serveral days multiple hours. I can't even say exactly how long. However, it's obviously a problem with small files.

aronovgj commented 8 years ago

@scolebrook Although setting innodb_flush_log_at_trx_commit = 2 may not be ACID compliant I strongly disagree that the performance improvement is marginal compared to setting it to 1, reducing HDD load from 100% to about 5% and reducing the sync time from several months to a few hours. Had about 70000 files, about 30GB altogether but also a lot of small files which took several seconds per file.

scolebrook commented 8 years ago

@aronovgj You aren't seeing the difference between 1 and 2 specifically. You're seeing how the requirements of the two impact storage that is constrained on iops. 2 does an append write to the log for every write type query. A very quick and simple write operation. Even slow storage can accomplish it quickly. It then writes the actual tables once per second these are insert type writes are take significantly longer than an append as some data needs to be re-arranged. Grouping these operations together and writing once a second allows for significant optimization. 1 on the other hand doesn't allow that optimization because there is no grouping of work.

If you have battery backup for your systems such that you're satisfied that the risk of data loss is sufficiently mitigated, 2 will give better performance. The level of performance difference between 2 and 1 will be a function of how slow the storage is. Where storage isn't the bottle neck, the difference, while still obvious, is not that significant. Changing os level settings or other settings in MySQL can have a bigger impact. This is our situation. Network latency is the primary constraint for us with our storage never getting above 10% utilization for any configuration in MySQL.

We actually run with 0 which is an order of magnitude faster than 2 because our infrastructure is backed up and protected to the extent where the it'd require the loss of the data center to lose that one second of data between writes. In that event we'd be falling back to our most recent offsite tape which would be at least 24 hours old. And that is exactly the same level of risk for using either 1 or 2 because loss of the data center includes loss of the disks.

Each situation is different. Knowing the details about these options is important to make sure your configuration is appropriate for your environment and your needs. Everything in IT is a compromise in one way or another.

Something else that will improve performance is spending some time tuning io at the operating system level. Our MySQL nodes are virtualized so we use the noop scheduler and let our storage environment do the optimization of write operations to it's disks rather than have Linux spend time optimizing only to have the storage backend re-optimize the MySQL traffic together with the traffic from other systems. For hardware servers it's probably best to measure performance with your workload and try cfq and deadline to find the right one for your situation.

At the end of the day, the issue is the number of queries that are required. 300+ per file must be far more than what is actually needed. If this can be optimized further as it has in previous versions, then it improves network and disk use.

aronovgj commented 8 years ago

Thanks for the clarification @scolebrook. So seems like we are talking about different problems in different environments. I have a home server with two HDDs, one of which is only for backups every six hours. And I actually can live with data loss of up to six hours. However I'd rather live without the risk and with proper writing speeds. I haven't found another solution for this until now.