owncloud / client

🖥️ Desktop Syncing Client for ownCloud
GNU General Public License v2.0
1.4k stars 666 forks source link

Optimization: Server to Client Notification System #1075

Open Fmstrat opened 11 years ago

Fmstrat commented 11 years ago

Hi,

I had started writing my own Dropbox replacement in Python until I ran across OwnCloud. First off, what a great find this was!

One thing I've noticied though is that since the server is 100% web based, it requires polling every 30 seconds from the client to the server:

10-04 15:35:51:695 * Polling "ownCloud" for changes. (time since next sync: 30 s) 
10-04 15:35:51:696 Setting up host header:  "***.***.com" 
10-04 15:35:51:804 * Compare etag  with previous etag:  false 
10-04 15:36:21:691 * Polling "ownCloud" for changes. (time since next sync: 60 s) 
10-04 15:36:21:693 Setting up host header:  "***.***.com" 
10-04 15:36:21:802 * Compare etag  with previous etag:  false 
10-04 15:36:51:687 * Polling "ownCloud" for changes. (time since next sync: 90 s) 
10-04 15:36:51:689 Setting up host header:  "***.***.com" 
10-04 15:36:52:392 * Compare etag  with previous etag:  false 
10-04 15:37:21:683 * Polling "ownCloud" for changes. (time since next sync: 120 s) 
10-04 15:37:21:685 Setting up host header:  "***.***.com"

While there appear to have been some great changes in efficiency in 1.4, I still wonder if this could be taken a step farther. When I started up, like OwnCloud, the version I was working on handled most of the backend functionality via PHP. However, I also created a service "helper" in Python that handled outbound notifications by keeping a live TCP connection with each client, allowing the helper service to notify clients when they needed to poll the server, vs. the "every 30 second poll."

Here's how it could work for OwnCloud:

I should also note, it would be even more efficient to have the Helper Service send the file meta-data that needed to sync so the client knew exactly what to grab, but that may be more difficult given the current structure of OwnCloud.

I had completed this programmatically, however I ended up using two TCP connections for communications instead of polling a PHP server because it was more efficient and I was using RDIFF to do comparisons on files at a binary level.

Thanks, Ben

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/998726-optimization-server-to-client-notification-system?utm_campaign=plugin&utm_content=tracker%2F216457&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F216457&utm_medium=issues&utm_source=github).
ogoffart commented 11 years ago

We would need support for this in the server. And it is hard to develop in PHP.

danimo commented 11 years ago

@ogoffart It's not hard to develop, it's hard to deploy given that people are used to just throw in the owncloud app onto their PHP server. Suddenly, we'd need them to run a daemon in parallel. This could be an optimization though.

Fmstrat commented 11 years ago

@ogoffart I'm not sure it's that difficult. While this is the exact reason I switched my app server to Python instead of PHP, PHP does support socket_listen. A walkthrough here http://devzone.zend.com/209/writing-socket-servers-in-php/ shows a very simple example. In fact, not much more than that example would be required.

@danimo I'm actually curious if we could write the daemon in PHP, and then have the PHP web server fork off the daemon if it's not already running. This would lower the overall management requirements like creating a service that works in Windows/Linux/OSX, or an "install" application that would then be required.

The daemon could actually be optional. For instance, if the client attempts to connect to the daemon and fails, or if the daemon settings in the client are not turned on, the client could fall back to 30 second polling like it does now.

shoeper commented 11 years ago

A really nice idea, hopefully this will come

jaydcarlson commented 11 years ago

The "right" way to do this would probably be to use WebSockets (so you're still operating on port 80 to remove firewall problems), but an easy and fast way to get something working would be doing long-polling. So, the client opens a connection to the server and requests, say, "update.php". The PHP script is invoked, and checks to see if anything needs to be resynced. If it does, it will return the path needing to be updated (as a JSON array or whatever). If it does not, it will sleep() for a second and check again. It will continue this sleep-check loop until it detects a resync, send the directory to the client, and close the connection.

On the client side, whenever it receives data from the connection or the connection is otherwise terminated, it re-opens the connection again immediately. This way, the client doesn't have to continually ask the server for updates.

Unfortunately, from casually looking around at the current implementation, this would require the server to take a significantly different role than it is doing right now. Currently, the client is responsible for figuring out what needs updating; that responsibility would now fall to the server.

animalillo commented 10 years ago

Oh, i created a while back a chat client server application using websockets and json with php. the only issue i found is that, back when I tried that little application, PHP had an issue with SSL processing.

Overall wont be so hard to make a php daemon server, and wont be too memory expensive given proper code is written.

This will be a great feature, and could improve the whole owncloud a lot, giving a wide range of new possibilities to this great application that still needs a lot of improvements.

LukeOwlclaw commented 9 years ago

WebSockets are mainly useful because browsers (thus clients) can use them. The oC client would also be able to use any other network socket (e.g., TCP - on which WebSockets are based, as well). However, using any kind of socket introduces 2 major problems:

@funkathustra's suggestion about long-polling is much better -> https://en.wikipedia.org/wiki/Push_technology#Long_polling It can be implemented without touching any of the current functionalities. It does not require any additional set up. However, scalability issues might arise as the server must be able to handle many open connections.

LukeOwlclaw commented 9 years ago

Wow, it does not seem hard at all!

There is already a filewatcher-plugin: https://github.com/owncloud/web_hooks

Right now, it waits for a cron event to push changes to a url. That is of course not what we want.

But if publisher.php/addNotifications() is just changed a little and forwards its changes to a poll.php like this:

<?php
/*out("Long polling sleep demo...");
$slept = 0;
for($i=1;$i<99999;$i+=$i)
{
  out("Slept for " . ($i) . " seconds. Total sleep time: ".$slept." seconds.");
  sleep($i);
  $slept += $i;
}*/
while(true) {
    $signal = magicMethodWaitingForSignalFromPublisherPhp();
    out($signal);
}

function out($s) {
  echo($s . "<br>");
  ob_flush();
  flush();
}
?>

I have no idea how to do the signalling in php but it should not be too hard, should it?

shoeper commented 9 years ago

script would be stopped by most webservers quite fast or it's execution time exceeds (and there are many more timeouts can occur). So this would work for people having their own server also being able to use sockets or other software with better performance than php.

LukeOwlclaw commented 9 years ago

I have a standard Ubuntu system with an Apache server and all default settings. Php processes live at least for 8000 seconds on my machine. Of course this is just an example...

Nonetheless, if the process lived only 30 seconds, the client would have to re-connected every 30 seconds. So you would have the same number of connections but instant notifications.

Of course scalability is an issue, so we definitely need the polling mechanism as fallback if the server does not support long polling.

PS: I opened a ticket for polling support here: https://github.com/owncloud/web_hooks/issues/32

LukeOwlclaw commented 9 years ago

@ogoffart @danimo @dragotin @DeepDiver1975 @karlitschek

Below I summarize the available options for instant notifications. I'd like to know what the official oc policy about each is. There is no perfect solution but if we want to introduce instant notifications which option should it be?

From the technical view point on TPC level, instant notifications can be provided in only two ways:

  1. The client opens permanently a port and makes sure the server knows how to find it (that is, client becomes a server).
  2. The client holds open a connection to a server (that is, e.g., TCP socket, WebSocket, long polling).

Option 1 requires much configuration for all clients as they are often behind NAT and/or firewall. From my view point this makes things too complicated. In company networks it might be introduced without too much configuration effort. Should this option be pursued? (Q1)

For option 2 there are two possibilites: A) client directly connects to the oc server or B) they connect to a third party push server (e.g., pubsubhubbub.appspot.com).

Option A) introduces scalability issues but no heavy server configuration is required. I wrote an oc plugin for long polling with php. Apache by default only allows 150 connections according to [1]. I estimate that this should suffice for up to 70-100 clients. With a better server configuration more clients can be served. Is this good enough that this approach can by followed/tried? (Q2)

Option B) puts the scalality problems to a third party server. However, it introduces privacy and security concerns. Is this acceptable? (Q3)

[1] http://oxpedia.org/wiki/index.php?title=Tune_apache2_for_more_concurrent_connections

LukeOwlclaw commented 9 years ago

@ogoffart Is there a way to trigger a sync using an external tool?

@icewind1991 I created an owncloud app which allows HTTP long polling for repo changes. Unfortunately it does not scale too well (theoretically, practically not tested). Are there any better plans? Here is the source: https://github.com/LukeOwncloud/long-polling-for-owncloud

animalillo commented 9 years ago

I think that using a third party breaks the whole original purpose of "Own"Cloud. I wouldn't like to need to use an external service with my private cloud, and if that's implemented it MUST be an option. Opening a port on the client side it's so old school and too complicated for most of the clients, specially if they move around the world and need the files in sync.

I think the best approach it's to keep it server sided. Opening a port or any other ideas you might come by, but the connection it's always from the client to the server.

Also, the php timeout can be override in most servers by the script itself (set_time_limit), so that's not a big issue.

So, making it short. Option 1 it's a bad idea that will bring more problems that an actual solution. Option 2: -A I think it's the most adequate approach for this project. In my opinion it would be nice to have a websocket server as it would give a lot of flexibility and the port can be automatically exposed to the clients via some call to the current web server even if it's opened dynamically, maybe you can even advertise in this way many possible different servers for the clients to connect so it gets easier easier to scale. -B It completely goes out of the spirit of owncloud, so I would discard it.

Aside of this, it might be nice to keep the current polling approach as a fallback, for old clients, and in case of server configuration problems or imposibility to open ports or any other problems arising from opening ports or keeping up the daemon.

It might be a good idea too to just use the daemon as a real time notification system and/or interface for little data transactions and leave the big data transfers to the current working API, so the daemon tells the client "you need to update this file" and the client goes and fetch it.

pascalBokBok commented 9 years ago

Push notifications to the file syncing tool would give a better impression of the performance and much smoother collaboration between people using Owncloud.

animalillo commented 9 years ago

And lots of bandwith and server processing time saved!

LukeOwlclaw commented 9 years ago

@pascalBokBok and @animalillo, did you try my long polling plugin? https://github.com/LukeOwncloud/long-polling-for-owncloud It is not used by any program and only shows changes in the browser, but it might be a start. It works for me, does it for you?

animalillo commented 9 years ago

@LukeOwncloud That doesn't solve my problems, that is the desktop sync software waste in connection opening and polling each time, making it not real time, and a waste of bandwidth most of the time. I don't mind browser being outdated until i refresh page or doing this kind of nasty tricks.

But i think with a websockets server all this issues would be solved in a flash, both for desktop and for web sync

LukeOwlclaw commented 9 years ago

@animalillo Of course this is not really a solution but just a first step in the right direction. The plugin is kind of a "websockets server". First, it needs to be tested if it also works for others. And since it is ignored by the oc guys, I'd like to hear your feedback. Eventually there might be a chance that it could be used by clients...

guruz commented 8 years ago

I wonder how feasible it is to overlay the long polling inside the nginx/apache config somehow, e.g. if "Depth: 0" in a WebDAV request we can assume that it is the ETag job that queries the server is something changed. Then send the request to the long polling instead of to oC. Only downside for long polling: If you use multiple accounts and/or multiple sync folders, the client will hang in the polling process and not process other syncs folders/accounts.

strugee commented 8 years ago

Only downside for long polling: If you use multiple accounts and/or multiple sync folders, the client will hang in the polling process and not process other syncs folders/accounts.

That can be fixed in the client, though.

DeepDiver1975 commented 8 years ago

Let's chat about a websocket based add on service.