owncloud / ocis

:atom_symbol: ownCloud Infinite Scale Stack
https://doc.owncloud.com/ocis/next/
Apache License 2.0
1.42k stars 183 forks source link

etag, oc-etag, ctag, sync-token and the future #3782

Open butonic opened 2 years ago

butonic commented 2 years ago

When a reverse proxy changes the content encoding by eg. compressing a plain text stream via gzip, it also changes the etag: Bug 63932 - Content compression breaks contract of ETag

While the desktop client prefers the OC-Etag it has learned to strip -gzip and maybe the W/ prefix from the regular ETag as a fallback: https://github.com/owncloud/client/issues/3946#issuecomment-147985427

However, https://datatracker.ietf.org/doc/html/rfc7232#section-2.1 states

[...] Likewise, a validator is weak if it is shared by two or more representations of a given resource at the same time, unless those representations have identical representation data. For example, if the origin server sends the same validator for a representation with a gzip content coding applied as it does for a representation with no content coding, then that validator is weak. [...]

@JanAckermann noticed that the owncloud-sdk is not yet prepared for this kind of etag handling. Please link your PR here. https://github.com/owncloud/owncloud-sdk/pull/1067 https://github.com/owncloud/web/pull/6952

@michaelstingl I wonder if iOS and android handle this somehow.

owncloud

The core issue that lead to OC-ETag is https://github.com/owncloud/core/issues/9005 which explains why we are now using our own OC-ETag header.

I'm still not 100% sure we are using etags correctly. AFAICT we should be using a ctag (content tag) to implement change detection in collections, as google recommends: https://developers.google.com/calendar/caldav/v2/guide

CTag and DAV:sync-token

However, the caldav-ctag-03 RFC has been deprecated in 2015:

IMPORTANT: The feature defined by this specification is now deprecated in favor of support for the WebDAV Sync REPORT as defined by RFC6578. Clients MUST NOT rely on this feature to detect changes to collections, instead they MUST support the WebDAV Sync REPORT. Servers MUST support the WebDAV Sync REPORT to allow clients to efficiently synchronize calendar collections. Whilst most modern clients do support the WebDAV Sync REPORT, servers MAY continue to support this specification by simply using the DAV:sync-token property value for the getctag property value, in order to provide backwards compatibility with old clients.

https://sabre.io/dav/building-a-caldav-client/ shows exactly how a PROPFIND with both cs:getctag and DAV:sync-token would look like, especially what form of URL to expect in the ̀ sync-token`.

MS Graph

ms graph has similar concepts, it just uses json:

Property | Type | Description -- | -- | -- cTag | String | An eTag for the content of the item. This eTag is not changed if only the metadata is changed. Note: This property is not returned if the item is a folder. Read-only. eTag | String | eTag for the entire item (metadata + content). Read-only.

Note: The eTag and cTag properties work differently on containers (folders). The cTag value is modified when content or metadata of any descendant of the folder is changed. The eTag value is only modified when the folder's properties are changed, except for properties that are derived from descendants (like childCount or lastModifiedDateTime).

But similar to the webdav sync is has a /delta endpoint for token based sync: https://docs.microsoft.com/en-us/onedrive/developer/rest-api/api/driveitem_delta?view=odsp-graph-online

Future

Both protocols indicate that having a deticated property to detect recursive changes makes sense. IMO we should

Related:

https://github.com/cernbox/smashbox/issues/46

labkode commented 2 years ago

@butonic another pointer: https://github.com/cernbox/smashbox/blob/master/protocol/protocol.md#restrictions-and-limitations

micbar commented 2 years ago

after GA issue

tbsbdr commented 1 year ago

Discussion with @dragotin @micbar @michaelstingl @felix-schwarz Discussion status:

@dragotin @felix-schwarz please continue the discussion

butonic commented 1 year ago

We will have cTag and eTag on the graph api anyway as described in the MS Graph section above. No need to invent an m-tag. eTag is for the entire item (metadata + content). cTag would be new and would be content only.

michaelstingl commented 1 year ago

eTag is for the entire item (metadata + content).

Perfect! 😻

cTag would be new and would be content only.

@TheOneRing @felix-schwarz do we have a requirement for propagation of content-only changes? I don't think so…

TheOneRing commented 1 year ago

I don't think so.

felix-schwarz commented 1 year ago

Currently, the ETag is used to signal:

If the ETag of files also changes for metadata changes, old clients get a signal that the file contents changed and to re-download the file (unnecessarily - in the case of metadata-only changes).

Therefore a CTag is essential for clients to be able to put an ETag-change into context - and to distinguish between a mere metadata change and an actual file content change to determine if a file should be re-downloaded.

Client updates will be needed for them to take advantage of the CTag to avoid unnecessary transfers.

The idea for a metadata-only MTag came from trying to find a backward-compatible way to signal metadata changes, where existing clients continue to work as expected - and updated clients can take advantage of metadata change propagation via the MTag.

Regarding other WebDAV-based clients or sync solutions, I have no insight about how they interpret an ETag change. But if they interpret an ETag change as "file contents changed", using the ETag to propagate both metadata and content changes might cause unwanted effects there.

michaelstingl commented 1 year ago

Current file checksum could be used the same way as the cTag. In oC10 world, not all files have a checksum.