owncloud / core

:cloud: ownCloud web server core (Files, DAV, etc.)
https://owncloud.com
GNU Affero General Public License v3.0
8.31k stars 2.06k forks source link

Add Webdav property to advise the sync clients to not sync this entry #26311

Closed PVince81 closed 2 years ago

PVince81 commented 7 years ago

There are several tickets that propose different approaches and different criteria for sync exclusions. Regardless of the criteria, I think the basic mechanism should be that the server returns an extra Webdav property <oc:do-not-sync>1</oc:do-not-sync> to tell the client to not sync this item.

It is then the responsibility of apps or whichever code that decides what the exclusion criteria is to set the property on the entries.

This is purely the backend/protocol level.

Some proposed criteria that would need to be implemented separately:

@owncloud/filesystem @guruz @ogoffart @dragotin @davivel @DeepDiver1975 @butonic

PVince81 commented 7 years ago

I'd even go as far as to say that these different criteria should be implemented / implementable as apps, which means that core needs to provide hooks (for sabre plugins?) for apps to be able to decide which entries to exclude.

PVince81 commented 7 years ago

then people can really go crazy with any custom use cases :smile:

PVince81 commented 7 years ago

cc @pmaier1

mmattel commented 7 years ago

Thanks for taking care of ! When going for an implementation, an idea is to (re)use this mechanism with federated sharing where you can forward the webdav property from the source - end to end. Like a fed-sharing checkmark in administration "forward no-sync webdav property". (Sorry full of ideas :smile:)

AnrDaemon commented 7 years ago

Keep It Simple and Stupid. Return HTTP/1.1 403 forbidden for files you deem undesirable for sync. You could add a header explaining the reason, so that a compatible client don't frek out and reduce this specific issue to a mere notice instead of sync error. But the first reaction must be clear that the action attempted is not permitted.

PVince81 commented 7 years ago

@AnrDaemon The problem with 403 is that not only sync clients use Webdav. And I don't think we should do user agent checking to do conditional blocking. Also in some cases a client could want to still get the metadata to be able to display it in some way without syncing, like for the folder selection dialog in the desktop client or like for the mobile client which only syncs whatever was marked as "make available offline".

Another name for this new Webdav property could be "sync permission" for people who prefer that name. :smile:

dragotin commented 7 years ago

We could do that, but my concern is that people will fail to understand why some things are synced and others not. How do we get that reasoning to the clients?

First reaction might be to add a string to the XML explaining that. That is hard to do because of translations. It would mean that clients need to send an accept-language header with propfinds and the server needs to parse it in case. Not sure if that is expensive or not. Probably we can do that, but I am not sure if there aren't better alternatives.

...

PVince81 commented 7 years ago

people

Who ? The sysadmins, the developers from third party clients, or the people using the desktop client ? For the desktop client, there's the sync tab about "Not synced", so the skipped entries could be added there with the reason "because the server told us so!".

pmaier1 commented 7 years ago

https://github.com/owncloud/enterprise/issues/1594

ogoffart commented 7 years ago

So we would check this property only for new folders? and we would still allow the user to check it in the selective sync? So it would be a default off for selective sync?

PVince81 commented 7 years ago

@ogoffart hmm, good point. I actually forgot about selective sync and was thinking of a "hard" way of blocking syncing, basically that the desktop client would refuse syncing any of these folders.

Yeah, but maybe you're right that it should rather be a suggestion that preselects (or pre de-selects) folders using the selective sync feature.

PVince81 commented 7 years ago

added these to the list in the main post:

excluding federated share individually from sync: #14727 excluding individual ext storages from syncing: #12216 (see sync option)

That's already quite a few use cases :smile:

PVince81 commented 7 years ago

Seems there was already a client ticket, linking here https://github.com/owncloud/client/issues/4621 :smile:

PVince81 commented 7 years ago

Added this to main post_

AnrDaemon commented 7 years ago

HTTP protocol is extensible by nature. The very semantics of its messaging system are open to interpretation and alteration within defined constrains. Taking reply code 403 as an example https://tools.ietf.org/html/rfc7231#section-6.5.3 you could set the reason right there in the first line. F.e.

HTTP/1.1 403 Synchronization prohibited

The code (403) tells CLIENT ITSELF that it shouldn't attempt to repeat the request. The text part is intended for the client's operaror and explaining the reason in verbal form.

PVince81 commented 7 years ago

@AnrDaemon yes, but this would need special handling in every client which are not sync client if they want to use the folder. Because most clients interpret 403 as an error and give up, they'd need to parse the text to find out that in fact it's not an error and they can indeed explore the folder and are only advised not to sync. This also means that third party Webdav clients like Dolphin (KDE), cadaver, etc will not dare enter this folder.

I think it's best to still return a success response to allow exploration, and add a hint there about the "please don't sync" info.

AnrDaemon commented 7 years ago

Default handling MUST be sufficient. Or a client just poorly written and needs fixing. Standard explicitly says that code 403 MAY originate from any condition server deemed necessary to raise client's awareness of it. And is not necessary a global failure. Only 5xx errors, or 401/403 AT THE STAGE OF AUTHENTICATION should result in a global error state.

403 in itself IS A success response.

6.5.3. 403 Forbidden The 403 (Forbidden) status code indicates that the server understood the request but refuses to authorize it.

(q) RFC 7231 "Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content"

mmattel commented 7 years ago

@PVince81 @ogoffart Totally disagree on soft = suggested = preselect folders to exclude where the sync client still would be able to overrule the admins decision to not sync a folder. This thread would turn sensless if this is the intention ! There are many clear descriptions in the referenced cases why hard = admin enforced is a must. Please also differenciate between browser / dav access and syncing. Only the latter needs admin driven restrictions.

moscicki commented 7 years ago

Hello,

Please consider and distinguish several cases which are different:

1) If a user does not have access to part of the tree than he can neither synchronize nor otherwise access it because he does not have permissions (https://github.com/owncloud/client/issues/4622). The error 403 seems to be most logical here.

2) Some criteria on the server where applied to advise the synchronization client not to synchronize part of the tree ("syncable" property). This has nothing to do with ability to access that tree by the user as nothing prevents the client to read the tree. So he may be able to navigate the tree in the web browser for example. The criteria may be set by the admin or by the user. If set by a user himself then this is more like a "global" selective sync configuration. I currently do not see a use-case where a user could not locally override such a setting but again -- this is at the discretion of the client to walk such a tree or not. If a user wants to shoot himself in a foot and synchronize something that admin (or himself) excluded -- maybe he should be free to do so (but maybe it should not be the default behaviour).

For efficiency reasons PROPFIND with depth 1 (directory listing) should be sufficient for the client to decide if it makes sense to walk a subdirectory. It should not be required to issue additional PROPFIND per subdirectory to find out 403 or the "syncable" property.

PVince81 commented 7 years ago

1) is doable already with the firewall app to block access completely. I think what @mmattel described is also achievable with the firewall app by additionally specifying which clients need to be hard-blocked, the clients being identified by user agent.

AnrDaemon commented 7 years ago

User-agent is easily spoofed. Please make up your mind, as @mmattel put it. Why do I need some arcane "firewall app" for a function that is apparently a core filesystem control feature.

PVince81 commented 7 years ago

This ticket is about an approach to implement use case 2 as described by @moscicki, basically advising to sync clients what the sync defaults should be, not hard-blocking.

For conditional access-blocking use cases (returning 403), see firewall app https://doc.owncloud.org/server/9.1/admin_manual/enterprise_firewall/file_firewall.html or raise a separate ticket to discuss alternatives.

AnrDaemon commented 7 years ago

Selective sync is an exclusive client feature. It has no place in core.

moscicki commented 7 years ago

Another two examples why currently this is suboptimal and why selective sync should be improved (and this partially relates to this ticket):

I think we should make sure that all three use-cases:

  1. inaccessible directories on the server (403 forbidden) e.g. as a result of firewall or storage backend ACLs
  2. directories which server advises not to sync (globally)
  3. locally excluded local/remote directories in a sync client (with and without file archiving)

are supported in a coherent way and where applicable via the same, simple and agreed protocol between client and server.

Maybe we should create another meta-ticket which would keep track all of that to achieve a globally coeherent system.

AnrDaemon commented 7 years ago

Selective sync is completely unrelated to this ticket, given its location. It is client that should decide if it want to sync entire tree or not at all, or if there's some bounds to it that it can debate back and forth. The server know nothing about client, its operating environment, storage capabilities and shit. More than that, it shouldn't care, ever, about client type/brand/whatever. Sever must not make decisions. Server must just plain work.

Local excludes are even less related to the ticket, as server know not, and will never be able to know, what they are.

davivel commented 7 years ago

IMHO, selective sync is relevant for the server as long as there are server administrators interested in managing it and we want to empower them, as for any other feature. If that's a responsibility for 'core' or for apps in the server is something out of my understanding.

AnrDaemon commented 7 years ago

And next day you get a call from your co-worker "my owncloud is broken!" Which, after a closer inspection, turns out to be selective sync at work. Do you, as a server administrator, want an increase in these calls?

davivel commented 7 years ago

As much as I don't want to be called because "I can't share my files" because sharing is disabled by admin.

jvillafanez commented 7 years ago

From my point of view, webDAV properties should contain only properties, not actions. As a property, you can send to the client: "this is an external storage, SMB (or whatever), estimated size of 50GB, modified on July 5th", and whatever information you want to provide, but a "please, don't sync this".... why are you sending the information then?

The client should decide, based on the properties received, whether sync the folder or not. User decisions (selective sync, for example) should be included, of course.

If the server doesn't want a folder to be synced, then it should hide the folder as it wasn't there.

PVince81 commented 7 years ago

If the server doesn't want a folder to be synced, then it should hide the folder as it wasn't there.

But that would also hide it from the web UI and other clients. Then you'd need the server to decide which client to send which response, based on user-agent (unreliable) or other criteria.

AnrDaemon commented 7 years ago

You're asking an impossible question. You either want this object exposed or not. There's no middle ground. No "web interface vs. desktop client". And the example with "this is an SMB share" is even worse. Why, of all things, you are exposing server internals to client?

jvillafanez commented 7 years ago

Can anyone point me to the use case to implement this, and reasons why sync filtering should be implemented server side (and not client side)?

As a user, if I want to download a file I don't care what the server suggests: if it's there, I want it. The same for files that I don't want to download.

Let's make an example with an iOS / android user:

The user opens the app, browse some directories, and decide to download a folder with photos. What should be the behaviour of the app if the folder has the "do-not-sync" property?

From the desktop side, as someone has already pointed out, this will lead to frustrations since you will see files from the web UI that won't be synced in the client. If you let the user choose to ignore that property, he will because he hasn't asked for it.

I'll remark again: these filters should be done client side. The server should provide enough information for the client to decide whether if it should sync the target folder or not.

And the example with "this is an SMB share" is even worse. Why, of all things, you are exposing server internals to client?

If you want to filter out by storage type, you should send that information. Otherwise you'll have to name all your SMB mounts with the same prefix and tell the client to ignore folders starting with whatever prefix, which will lead to issues with local folders sharing the same prefix. Maybe there is absolutely no interest in filtering by storage type, and just sending "this is an external storage" is enough.

We might want to include a server side flag to include those custom / internal / private properties in the result or not.

davivel commented 7 years ago

sync filtering should be implemented server side (and not client side)?

I don't think this is about server or client side, but about server AND client side.

PVince81 commented 7 years ago

Would be good if people who requested this kind of feature could elaborate on the detailed use real-world case.

So far I had the feeling that it was mostly about providing a way for admins to dictate behavior to the clients, similar to how in some enterprises the system administrator can remote setup (provisioning) or remote wipe clients/devices.

AnrDaemon commented 7 years ago

I don't think this is about server or client side, but about server AND client side.

So, you say that you need to create a proprietary functionality, that already confusing people even before implementation, to achieve… what?

mmattel commented 7 years ago

First of all we should differnciate between soft (the admin suggests, the client can decide) and hard (the admin decides and the client has no choice).

soft helps the client to decide if he wants to sync or not. There is no sync restriction from the admin side, but if you see a suggestion at the client side you may want to follow it first and see if it is really necessary to include it. See the exampe of pictures in hard below. If the space used of the shared resource is big, assume 20GB, you might want to give a exclude suggestion to the client but the client can overrule it if really needed.

The following explanation is on hard:

Why does IT has created methods like Active Directory, LDAP, Kerberos ect? It is to authenticate users or machines and grant them access rights to resources. Someone defines the rules and the admin implements them. Lets look for a detailed example in AD: There is a share containing marketing stuff. Some of that is common, some of that has restricted access. Usually you create groups reflecting the rights, assign the grous to the folder or drive and also assign these groups to users.

Giving the link to our case, remember - hard we have the following need, example: A user should be given access to a resource (eg a mounted drive or a folder) but for some reason (see below), he should not be allowed to sync it with the client. Means he can access it via DAV or browser, but not with the sync client.

Lets have a more detailed look on possible cases: Case 1:

SMB mount \\storage\pictures
Total size: 1.5TB

mount pictures is shared to users or group(s)

pictures 
 - subfolder 1
 - subfolder 2
 - ...

As an admin I need to grant access but I also want to restrict from the server side syncing the complete share (hard, no choice!!). Having ony 10 users who are granted access, I would need to sync 15TB. Having a notebook user, 1.5TB would most likely bust his local drive ... Rename pictures by video and you will see that 1.5TB is a low value ...

Case 2:

SMB mount \\storage\marketing
Total size: 200GB

mount marketing is shared to users or group(s)

marketing 
 - subfolder common
 - subfolder events
 - subfolder pictures   (excluded from sync, hard)
 - subfolder internal   (partial excluded from sync, hard)
 - ...

As an admin I need to grant access but I also want to restrict from the server side syncing to particular parts of the shared resource (hard, no choice!!). In the example we have an hard exclude of subfolder pictures to all users/groups and an partial exclude of subfolder internal to some users/groups.

Hope that this clarifies the stuff and why there is a differentation between hard and soft and the need for both types of sync exclusion methods.

Post note: Statement: if you do soft, there is only a small step to hard or vice versa. Picking up the commonly existing and already implemented methods of using groups to assign rights (owncloud / LDAP ect), I propose to also use groups for hard and soft sync exclution assignments and based on the type implement methods to handle the execution.

As an example, at the admin page add a new part where you can assign groups (plural!) to the sync types of hard and soft, add at sharing listboxes with hard and soft sync exclusion where you can chose the former assigned groups (both can and must exist in parallel). Add methods in core for hard respectively in core and the sync client for soft.

AnrDaemon commented 7 years ago

You're making the life harder for yourself without any gain whatsoever. Server knows nothing about client, all it knows is the protocol between the two. Request-response. How on earth the server would know anything about client, if all information coming from it can be faked? And will be faked, if server behavior leading to inconsistent client behavior, which leads to user's frustration.

mmattel commented 7 years ago

After being +30 years in business, I can tell that this capability whould be VERY useful... The usecases above are real life examples. But this is of course only from the view of professional use.

You are right that you can fake eg the user agent. Even it can be done, no professional would do that and regular users cant do that. If you use ownCloud as playgroud between IT specialists your statement is true. But I expect that the majority of users are regular users and the majority of admins are regular admins. All of them having the same whises and issues eveywhere.

AnrDaemon commented 7 years ago

After being in business for so long, you seems to have grown a habit of making your life complicated, and of your coworkers as well. The size of the storage has nothing to do with synchronization, at all. The permission to upload is an entirely different topic from synchronization. Half of your use cases is solved by returning correct HTTP codes to respective requests, the other half just doesn't stand the test of sanity and consistency.

Let me put it more straightforward.

The term "sync" we were frivolously using in conversation up to this point does not constitute an actual action (request). The actions are

  1. Listing of the remote repository.
  2. Retrieving from the remote repository.
  3. Uploading to the remote repository. One does not necessarily implies the other. The user may have the rights to retrieve an object, but not list its neighbours. The objects may be available read-only, thus an attempt to upload one will result in 403. Et cetera.

Now, where does your "selective sync" comes in play? Edge cases? Make sure they are not on the edge of failure. Monitoring, notification, there's a wide array of tools available, why are you not using them? And of all the things, if you need a restriction on how much a space users may take with their uploads, this is a task for things like "file firewall" or whatever. "Arbitrary restrictions imposed upon imagined criteria."

pmaier1 commented 7 years ago

From my POV we should differentiate between admin and user functionality. If a user does not want to sync content, he can use selective sync/ignore list in the client. This covers all suggested use cases if the user has the intention to prevent certain directories from being synced.

Regarding admin functionality we have the use case that an admin might want to suggest/enforce (soft/hard) certain directories not to sync globally (e.g. external storage/other dirs) or not to sync particular dirs based on user/group.

hard way (blocking)

Afaik these requirements can be satisfied with tagging and file firewall. As we chose this mechanism to handle such issues we should probably use it instead of developing new mechanisms that produce inconsistencies. The FF relies on criteria that may easily be spoofed and that are not secure therefore, but this is not a security feature. It rather is an easy-to-use helper for admins whose users make use of the official client. The main thing in this context that comes up to my mind is that we need to inform users about restrictions.

soft way (suggestion)

It would definitely be nice for an admin to give a suggestion and set syncing defaults for users. Regarding consistency this would then also need to happen within the FF which probably would be inconsistent again since the FF is a tool for hard blocking ultimately. Moreover there is no functionality to inform the client about the reason for restriction. A user should clearly see why some folders are not synced by default.

Wrapping up I think if we want to provide the 'soft way' there's no way around adding a webdav property to let clients know a bit more about imposed restrictions to provide proper user feedback. Maybe there are more possible solutions for implementation. For now I would prefer 3) or 4) but we should pay attention not to confuse admins with FF.

AnrDaemon commented 7 years ago

For solution 1 - there is a place for reason, you can give it along with a denial reply. In the reply itself, in the reply body, or both. The rest, you just repeated what I've been saying earlier: such functionality is inconsistent in its very definition and any sane client would ignore such "suggestions".

jvillafanez commented 7 years ago

If server has to filter out files, it must do it for good. I mean, the file won't be exposed to ANY client. It's pointless to try to control clients from the server side. At most you might be able to control YOUR client (aka, the official ownCloud desktop client) but not any other.

Whatever people do in their own computer is their own business. If I want to download my whole ownCloud account I will, and I don't care if the files are stored outside or if I have to download 10TB. I can go to my ownCloud account and download my root folder, or automate the download with a couple of curl requests.

@mmattel for your case1 there are several approaches that fit better:

mmattel commented 7 years ago

Most of us look to the examples from a very high knowledge and experiene level. Ordianry users do not use curl or any other technology which needs deep knowledge and understanding. They do not even know that these things exist, and just use it and want to use it as a no-brainer. Give them access with the browser, they use it, give them access with a client (sync or dav), they use it. They do not want to spend a lot of brain into it that is not their focus. All users do understand if the admin says, use the browser, use the (dav) client, this is what you can post configure and this is restricted and you can not change for what reason ever. 99.9% of the users do not change or try to fake the standard settings, maybe they ask to be granted access but not more. This causes much less post work to the admin and the users as they just work and do not want to tinker. Just my opinion, based on experience.

AnrDaemon commented 7 years ago

That's how the fall starts. First you separate people into "ordinary" and "some other", then you point to a chosen group and say "look, they are not complaining and very content with features we provide, we should continue in this direction". That's how Windows 10 was born.

guruz commented 7 years ago

FYI The 2.3 client has an option now "Ask for confirmation before synchronizing external storages" which should already help for a lot of scenarios. https://cloud.githubusercontent.com/assets/2644445/22258989/e928fa92-e263-11e6-9c70-575316035231.png

mmattel commented 7 years ago

This is very good and useful !!

From the client side, clients (human) do not know and/or care if the storage behind is external or not, they just see "folders". And client side choices are, by design, not oC admin driven.

guruz commented 7 years ago

an extra Webdav property 1</oc:do-not-sync> to tell the client to not sync this item.

@PVince81

Would it be easier for all implementations to get a new flag inside the oc:permissions?

soft: Something like an "advisory soft don't sync"?

hard: We of course return a 403 for that file/folder. However I think there should also be a flag in permissions being returned so that the clients can avoid even doing those unsucessful network requests?

So two new permissions in https://github.com/owncloud/core/blob/f10d105ea2f2d0822693313ddcd6f9b2eaac6ebe/apps/dav/lib/Connector/Sabre/Node.php#L291

PVince81 commented 7 years ago

New property or not doesn't affect server implementation that much. Not sure about clients...

moscicki commented 7 years ago

Yes. I agree that there are two cases: soft and hard "do-not-sync". This will provide a general framework to fit most of the applications.

Soft is anything that is just advisory and may be overriden by the client settings. One application for soft is just a global exclude list maintained by the server and it may be overriden by the local client settings.

However hard means that the server will return 403 (or equivalent) when the client tries to access the subfolder. Therefore it does not make any sense for the client to try (or retry) to access this.

It makes sense to have this as a webdav property to avoid calls which we know will return 403. The client side logic could be the following:

You may take into account that this property may be modified in the future. So you should not cache forever (if at all) the value of this property on the client side.

If a request fails with 403 then one should see the property of enclosing parent folders. You can imagine that X/Y/Z fails with 403 but also X/Y fails with 403 because X is hard-do-not-sync. This can happen in the middle of the sync run if server property is modified but the subsequent sync run should handle it well.

My particular use-case for this is hard: a part of the tree is not readible by the current user.

ownclouders commented 6 years ago

Hey, this issue has been closed because the label status/STALE is set and there were no updates for 7 days. Feel free to reopen this issue if you deem it appropriate.

(This is an automated comment from GitMate.io.