remotestorage / spec

remoteStorage Protocol Specification
https://tools.ietf.org/html/draft-dejong-remotestorage
87 stars 5 forks source link

Case sensitivity of path/document names #181

Open raucao opened 4 years ago

raucao commented 4 years ago

The question came up in https://github.com/remotestorage/remotestorage.js/pull/1179 and I haven't found any mention of it in the spec. Has this been discussed before?

michielbdejong commented 4 years ago

We should definitely mention it, because there is an expectation that URLs might be case-insensitive.

E.g. on github they partially are:

As a programmer, I lean towards saying the URL should be case-sensitive, because in most programming languages string literals are case-sensitive. But I think I could be persuaded either way.

kevincox commented 4 years ago

If we want to work with providers who are using various filesystems there is a bit of a problem.

remotestorage choice case sensitive backend case preserving backend case folding backend
case sensitive trivial hard1 easy 2
case preserving hard3 trivial easy2
case folding hard3 possible trivial
  1. When looking up files you need to check for collisions. This can be significantly more expensive than a case-sensitive implementation would be.
  2. When making changes you need to downcase every filename.
  3. You need to find out how to store files that vary only by case. This can be done by encoding the filename. (For example a base32 encoding with only lowercase letters.

Looking at it this way the optimal solution for implementers is case-folding. However this has a bunch of problems for applications and users.

  1. Users may expect to be able to store things that vary only by case. (Especially languages with non-bijective folding rules)
  2. At this point you are basically required to do full unicode folding[citation needed] which makes everything more difficult. (But you probably need this anyways unless you are treating paths as bytestrings)
  3. Many applications will now need to store the case some other way (if they want to use the human names in the storage path).

With those things considered I think we should treat paths as byte strings. This makes it easy for the servers to make fast, accurate implementations of remotestorage. However it does mean that it passes folding and normalization onto the app developers. However I think that can be fixed with a couple of good libraries and will be a lot less painful to fix than tracking down a couple of remotestorage implementations that do folding wrong (or just use an older Unicode standard).

There are downsides though:

We should probably also check with major remotestorage providers to see what they do and if it would be hard for them to migrate/support the standardized way.