owncloud / client

🖥️ Desktop Syncing Client for ownCloud
GNU General Public License v2.0
1.4k stars 663 forks source link

Files and directories containing : or | are ignored #854

Closed rikvdh closed 9 years ago

rikvdh commented 11 years ago

Expected behaviour

Files getting synced to my ownCloud server

Actual behaviour

I see this occurring in the Linux client. When I create files or directories containing | : > < ? or a few other characters, they are automatically ignored.

I already did some research and found this commit in csync: http://git.csync.org/users/freitag/csync.git/commit/?id=e0807cba1b164fcf409cb71e3da2f8f7b88637a9

I don't know why this commit is there, creating files on Linux with | : < > * ? is perfectly valid. :)

total 16 drwxr-xr-x 4 rik rik 4096 Aug 11 08:32 . drwxr-xr-x 10 rik rik 4096 Aug 10 16:11 .. -rw-r--r-- 1 rik rik 0 Aug 11 08:22 abc*def -rw-r--r-- 1 rik rik 0 Aug 10 16:02 abc:def -rw-r--r-- 1 rik rik 0 Aug 11 08:21 abc>def -rw-r--r-- 1 rik rik 0 Aug 11 08:22 abc?def drwxr-xr-x 2 rik rik 4096 Aug 11 08:22 abc|def drwxr-xr-x 2 rik rik 4096 Aug 10 16:08 a|c

Steps to reproduce

  1. Install Linux client
  2. Sync ownCloud
  3. Create a file with: touch 'abc|def' and see it getting ignored :)

    Server configuration

not applicable, normal Debian Wheezy with apache2

Client configuration

Client version: latest from git (but also occurs with 1.3.0) Operating system: Debian Wheezy OS language: en_US Installation path of client: /usr/local/bin/owncloud

Log

08-10 16:10:59:661 csync_ftw: Uniq ID from Database: test/a|c -> 08-10 16:10:59:661 csync_walker: directory: /home/rik/ownCloud/test/a|c 08-10 16:10:59:661 _csync_detect_update: test/a|c excluded (1) 08-10 16:10:59:661 _csync_detect_update: ==> file: test/a|c - hash 6527849599191992178, mtime: 1376143708 08-10 16:10:59:662 _csync_detect_update: file: test/a|c, instruction: INSTRUCTION_IGNORE <<=

rikvdh commented 11 years ago

When I apply this patch to csync everything seems to work normal and the files are synced correctly:

diff --git a/src/csync_exclude.c b/src/csync_exclude.c
index 81f828d..42c4f0f 100644
--- a/src/csync_exclude.c
+++ b/src/csync_exclude.c
@@ -147,13 +147,6 @@ int csync_excluded(CSYNC *ctx, const char *path) {
     for (p = path; *p; p++) {
       switch (*p) {
         case '\\':
-        case ':':
-        case '?':
-        case '*':
-        case '"':
-        case '>':
-        case '<':
-        case '|':
           return 1;
         default:
           break;
ghost commented 11 years ago

Hi,

AFAIK some of this chars aren't supported on other platforms like windows. Maybe thats just the reason why they are ignored. Keep in mind it doesn't help you if you can sync that files on linux but not on windows in a mixed setup.

rikvdh commented 11 years ago

Hi,

I see what you mean, I've booted a Windows VM and checked and I'm indeed not allowed to create files with : or |, sorry.

MarcelWaldvogel commented 11 years ago

I nevertheless think this should be changed on Linux/OSX clients; if a directory is only ever used by those clients or the web interface, there is no reason to silently ignore these files.-

My expected behavior would be (in order of decreasing priority):

  1. All files are synced with the server; if a client receives a particular file from the server which contains an illegal character, it is escaped on the client side (i.e., the server file '1:0.png' becomes '1%{3A}0.png' when downloaded to a Windows box only, unchanged in all other views). There are probably better choices for the escape char…
  2. (Fallback) Warn the user on the first sync where this file is found (in either download or upload case), that it will not be synced.

BTW: If you want real Windows file name safety, files with a basename of NUL, CON, AUX, PRN, … should also be treated specially.

a-schild commented 11 years ago

I also just stumbled over this issue. A client (with OS-X) has many folder + files with timestamp as filename. Like: "Rechnung 2013.09.12 08:12.docx"

I think this exclude list should be the list of forbiden characters the OS on which the client is running. It has then to handle cases correctly, when there are files on the server which are not allowed to be downloaded.

When a file with name "Rechnung 2013.09.12 08:12.docx" is put via Webdav on the server, The mac client should be able to sync it. The linux client should be able to sync it. The windows client has to handle it (In what way still has to be defined)

André

rikvdh commented 11 years ago

I also think this can be fixed as @a-schild metioned. Dropbox does this the same way,

dragotin commented 11 years ago

@a-schild did not mention a solution but a feature request. What is the solution? Please explain what dropbox does and why that is good if you want us to implement that ;-)

MarcelWaldvogel commented 11 years ago

I would suggest moving this thread to a different ticket, as it has nothing to do with the title anymore. I do not know how Dropbox does it, but a possible method would be similar to the one used in PeerStore:

  1. To make the method cope with inserts, deletes, and moves in the file, they use Rabin fingerprints to split the file in anchor-based blocks of many kiB each. (E.g. by waiting for the least significant 16 bits to become 0, the file is split into blocks of about 64kiB each; resynchronizing in the first or second block after the change.)
  2. When synchronizing the file, the sequences of the hashes of the updated file are sent to the other side, together with the changed blocks. The remote side can then reconstruct the file from this information.

Of course, there are some options to this basic scheme which can improve worst-case behavior (e.g., storing the hashes locally or bounding the blocks into a given range of lengths).

a-schild commented 11 years ago

@dragotin Yes, it was not a solution, but it's definitively a unexpected behaviour of the sync client. One good showcase is for example the sync of a music library. Here we often have : and ? characters under OS-X & Linux.

I would suggest that the client does sync all files with legal filenames of the platform it is running on. So a linux + OS-X can sync files with *,? and : characters in file+folder names from/to the server.

On the other hand, a Windows sync client will then have make a descision on what to do with such invalid filenames. As @MarcelWaldvogel mentioned in the second last post, it could for example escape all "invalid" characters to something like %xx or just replace them with a _

I don't see what the last comment of @MarcelWaldvogel has to do with the concept of invalid file name characters...?

MarcelWaldvogel commented 11 years ago

Sorry, had confused the thread. Please ignore.

Viele Grüsse, -Marcel Waldvogel (kurz&bündig, da mobil)

Am 28.09.2013 um 08:39 schrieb a-schild notifications@github.com:

@dragotin Yes, it was not a solution, but it's definitively a unexpected behaviour of the sync client. One good showcase is for example the sync of a music library. Here we often have : and ? characters under OS-X & Linux.

I would suggest that the client does sync all files with legal filenames of the platform it is running on. So a linux + OS-X can sync files with *,? and : characters in file+folder names from/to the server.

On the other hand, a Windows sync client will then have make a descision on what to do with such invalid filenames. As @MarcelWaldvogel mentioned in the second last post, it could for example escape all "invalid" characters to something like %xx or just replace them with a _

I don't see what the last comment of @MarcelWaldvogel has to do with the concept of invalid file name characters...?

— Reply to this email directly or view it on GitHub.

ghost commented 11 years ago

Hi,

just my two cents:

Having different behaviors in the client on different platforms will probably leads to a big mess:

I don't think that it is that easy to just replace special chars or escape them but maybe i'm wrong.

a-schild commented 11 years ago

@RealRancor Files with öäüéàè etc. should not cause problems (At leat for me they work)

a-schild commented 11 years ago

I know it's not a simple thing to do the sync right, but to NOT sync them is definitely the wrong thing.

Using a escape-style encoding of the special characters should be possible... (Not saying it's easy)

ghost commented 11 years ago

Files with öäüéàè etc. should not cause problems (At leat for me they work)

This is just a reference from the other issue where i'm referencing a user to this issue about the ignored : and | chars in filenames.

but to NOT sync them is definitely the wrong thing.

I rather would write:

but to NOT sync them is definitely the wrong thing in my opinion.

:)

rikvdh commented 11 years ago

@RealRancor It is the wrong thing. If the tool doesn't notify users about not syncing files the whole tool isn't reliable and will not be adopted by companies or non-technical people.

I think escaping on server-side is a solution, as I would not prefer Windows as a server it is probably still something you want to support.

Client side you can replace characters if the OS does not support it. Replacing : and | on windows to _ will still make filenames read-able for non-geeks.

From my opinion this is the best solution, and also Dropbox does it this way. (not sure for server-side, but probably they don't run windows because their transfer-system is based on rsync).

ghost commented 11 years ago

It is the wrong thing.

Its still your opinion. I think the devs have their reasons why they are ignoring those files. Let them decide how the client is handling this stuff. :)

Client side you can replace characters if the OS does not support it. Replacing : and | on windows to _ will still make filenames read-able for non-geeks.

Renaming is probably the worst solution for this because if you rename the file it will become a new file. If you're doing a rename you probably need to rename the file in all clients (Mac, Linux, Windows)?

dragotin commented 11 years ago

@RealRancor is absolutely true saying that it is not easy at all. If we change a filename by escaping the next bug report will be that we overwrite files with escaped ones...

It is doable. But it is not "just escape and be happy". The day will come were we open that keg... OTOH, nice that we are now in a state were we discuss this as a real problem, we had other times with other probs ;-)

BTW, the "ignored chars" problem should not be confused with UTF8 problems (à,ò,è,ì and friends) - these often originate from misconfiguration on server- or client side.

ghost commented 11 years ago

Hi,

BTW, the "ignored chars" problem should not be confused with UTF8 problems (à,ò,è,ì and friends)

this problem is not confused with the UTF8 problem. :) A user in the referenced issue (not the creator of the issue) had exactly the question why files with : are ignored:

https://github.com/owncloud/mirall/issues/1048#issuecomment-25295641

muetze-online commented 11 years ago

I see the problem to find a good way with these files unter Windows while downloading them.

But the situation NOW is, that any Mac- or Linux-user will only see, that their files are NOT uploaded, when they look into the sync-details. The ownCloud-client is a sync tool that doesn't sync my files - not the expected behaviour. I have these files for years and I don't think, a program should tell me, how my files are named - only the operating system should do such things.

Do it the dropbox-way: You can delegate the problem to the Windows users: A Windows-User will see in the sync-details, that some files are not downloaded. If they use Windows and Mac/Linux they probably know about the filename incoherences. I tried this with dropbox. Files with " or ? in its name will be uploaded from the Mac and NOT downloaded on a Windows-system. This is no perfect way, but I will sync my files using ownCloud - not dropbox.

Please make it possible to sync files with " and ? etc.

muetze

KasumiNinja commented 11 years ago

I also think that ignoring files is the wrong thing to do. This leads to unexpected behaviour and could lead to users losing files. I think it's best to rename files with underscore. Is there any eta when this problem will be addressed?

muetze-online commented 11 years ago

Renaming is no good idea, because it doubles all files with these characters, because the renamed file will be synced either. Instead the renaming has to be done in a way, that the new file will not be synced - like filename_renamed_by_ownCloud.ext. But there is the next problem - how will changes in this file be resented to the original with the ignored characters. You can do it with a journal of all renaming - but I don't know if this is an solution.

muetze

MarcelWaldvogel commented 11 years ago

The solutions in order of increasing implementation complexity:

  1. Do not sync files with characters which cannot be represented by the local file system, but add it to the list of "ignored files" in the sync log (1)
  2. Use escape sequences on the local file system only. Instead of (ab)using an ASCII character again, which is likely used (e.g. percent-encoding), use a Unicode character that is unlikely to be used in file names, such as U+2707 (tape drive). Then replace ":" with (TAPE DRIVE)COLON(TAPE DRIVE) etc., which still allows the user to use the tape drive symbol in file names, if she really wants to.
  3. Also record this translation in the csyncdb; however, this makes it hard to unencode the translated filename on copied/moved files.

(1) It would be good to list a reason for the file being ignored during sync: filename in ignore pattern list, illegal character, or hard link (even though I strongly believe that a hard link should not be a reason for ignoring the synchronization, but it currently is the case)

muetze-online commented 11 years ago

Thank you for this clearly and evident post.

muetze

a-schild commented 10 years ago

MS SkyDrive has similar problems, and this it's not well accepted by the users

http://community.office365.com/en-us/forums/154/t/165638.aspx

danimo commented 10 years ago

Just an update on this: We can't decide this on the client side. The problem is that we need to ensure that the server can always handle the characters that we allow to upload. I have created https://github.com/owncloud/core/issues/6102 to track the server-part discussion.

muetze-online commented 10 years ago

Is it not possible to let the user decide, which characters are synced. I am a fan of transparency. It took me a time to find the problem ignored characters in filenames. There could be an option for every synced folder like: "sync folder only on Mac, forbidden chars: ':'" and "folder sync with windows, forbidden chars: ":,<>/|?'" etc.

This lets the user decide, which characters will be synced and tells him. A help text can explain the problems with some chars under windows - Mac OS X can use all characters except ":". I decided this operating system because of its wide possibilities and the OwnCloud-Client make me not to use this chars - this su***.

I think, transparency AND user decision can be a solution, until a server-sided solution is found, which will be difficult.

p-bro commented 10 years ago

Just to weigh in: Any limitations on the file names imposed by ownCloud (and not by the OS of the client) will reduce the user acceptance of the program. I don't want to rethink my file naming because of my cloud service, and actually renaming files is error-prone and an (in the user's perspective) unnecessary hassle.

I'd be fully satisfied with not downloading files with unsupported characters to a client and giving a notice to that effect. The file could then be retrieved from the web interface, if need be. But if the local file system supports a file name, it should be included in the sync.

I am not familiar how the files are stored on the server. If it's really unfeasible to deposit files there with characters not allowed on the server file system, a first workaround would be to also exclude those file names. But IMHO that would still only be a kludge: From a user perspective all legal file names should be synced, if the users wants to.

matiasbarone commented 10 years ago

In my case I made a little script that make a copy of files with : and ? replacing thats characters for _ and allow sync the copy. Put it in a cronjob, and problem "solved".

# Load files that contain the folders
FILES=/home/myhome/.local/share/data/ownCloud/folders/*
# For each folder
for f in $FILES
do
 # obtain the path from the config file 
 . $f >/dev/null 2>&1
  path=$(echo -e "${localPath/\x/\u00}")
  echo "Processing $path Folder..."

 # Search in the path files with special characters
 find $path -type f -name '*[:?"]*' |
    while IFS= read -r; do
      #make the copy or overwrite if the original is newer than the copy
      cp --verbose -p -u "$REPLY" "${REPLY//[:?\"]/_}"
    done
done
xopxe commented 10 years ago

I agree that he client should sync-up everything as is, and only download what it can. This is simple, while limiting the damage to the minimum. This can be easily explained to the final user (this os can not store files with the name you choosed), and seems legitimate. But saying "we won't handle this file because there is a system out there (whether you use it or not) that can not handle it" is just... Meh. I'm synching Linux to Linux, don't even have a Windows installation available, and this behavior is annoying as hell. I certainly will not rename my files just for this, specially knowing that there are other options that don't force me to.

Bazon commented 10 years ago

Another reason why this is bad: If you save a page with Firefox, it automatically includes a"|", at least with Linux: The name of the saved file and folder is "pagetitel | filename[.html]" by default. So each and every saved webpage causes this trouble with owncloud. Works with Dropbox by the way.

scott-ainsworth commented 10 years ago

I agree with @xopxe, synchronize whenever possible. And definitely never, ever fail silently (see issue 1940 for more detailed thoughts on this). As for filename conflicts on the server, my solution in a similar circumstance was to percent encode (RFC 3986, Section 2.1) filenames before storing them on disk. The server can then store all files on all operating systems and still easily provide the actual filename to client software and the web user interface.

dhjdhj commented 10 years ago

I just ran into this issue myself the other day, reported it on the regular forums and was redirected here.

For what it's worth, we have other applications where we have seen this issue and our solution has always been to have our app just replace all invalid characters with underscores, a pretty trivial change that just eliminates the issue.

Bazon commented 10 years ago

@dhjdhj: underscrore is not a solution, as this mapping is not bijective. E.g. both test|1 and test:1 would be mapped to test_1 and you can imagine what sync issues are caused by that...

dhjdhj commented 10 years ago

It is trivial to test whether there is already a file with that name such that you can extend it. Mac OS X does this by default, for example, it will append (1) or (2) or .... as needed to a filename to make it unique.

So the desktop cloud app, during a rename, could just check for an existing file and then append extra characters.

Or you could just replace all illegal characters with an underscore (say) and then append a unique GUID to the end (or just a simple timestamp yymmddhhmmss) or you could replace the illegal characters with their spellings or HTML encode them.

The point is, there are lots of ways to address this and any of them would be better than sitting around for days before discovering that a file you put in one place with the expectation of it being available to you elsewhere wasn't!

scott-ainsworth commented 10 years ago

Bijective is a nice, succinct description of a set possible solutions to this mapping problem. I can quickly think of many solutions for the simple filesystem-to-server mapping, but when several types of filesystems are involved, the problem is clearly much more complex. Is there a good problem description documented (I can't find one)? I would like to take a crack at the solution space.

Bazon commented 10 years ago

@galsondor: What about the URI-HTML conversion dhjdhj mentioned? It's an existing solution, so there is no need to re-invent the wheel. See e.g. this online conversion tool: http://www.url-encode-decode.com/ E.g. the name "Test:1 Test|2" would be converted to "Test%3A1+Test%7C2".

So this is how the syncing client could work: Downloading files: Check, whether the filename from server includes forbidden characters in the client filesystem. If yes: encode to URI; else: just keep the filename. Uploading files: Decode the filename anyhow. If it is already decoded (all characters allowed on client filesystem), it will just stay as it is ("Test:1 Test|2" will be decoded to "Test:1 Test|2"), if it has been encoded before, it will get back to name it has on the server ("Test%3A1+Test%7C2" will be decoded to "Test:1 Test|2").

That could be a way to have unique filenames on the server and flexibility to local client filesystem restricitions on the other side.

ghost commented 10 years ago

Hi,

anyone noticed this:

https://github.com/owncloud/core/wiki/Cross-Platform-File-Handling

sarusso commented 10 years ago

Can we please set a high priority on this ticket somehow?

Until today, ownCloud DOES NOT WORK for Linux/Unix/OSX.

This is a FACT as already stated above, by several people and from various points of view. My personal one is that I cannot sync some projects, some mobile apps and some saved documents. OwnCloud is just not reliable if one assumes that it does what it claims to do (sync stuff) and does not dive in the doc.

Cheers, Stefano

ghost commented 10 years ago

@sarusso See the comment here: https://github.com/owncloud/mirall/issues/854#issuecomment-29466993

If this is for high priority for you feel free to contribute code to the core issue linked in the comment above.

moscicki commented 10 years ago

There is lots of noise about this issue already but let me say that I completely agreed with the point of view of @p-bro (https://github.com/owncloud/mirall/issues/854#issuecomment-34640242).

We have a server running on unix and users who only use unix and the issue of not syncing ":" character is completely irrelevant for them (and not appreciated).

For the owncloud servers running on unix is this a problem? Or is the problem only because of owncloud servers running on windows (as the server's filesystem cannot support this)? If this is the only limitation then it may be overcome very easily: you query the server with status.php already so you can know if it supports a restricted character set or not. With this knowledge you either restrict the character set on the client (as it is done know = windows server) or lift up any restrictions and implement the simple logic proposed by @p-bro.

No renaming of filenames please or any such mess.

ghost commented 10 years ago

@moscicki

As the client can't handle this descision at the moment according to the linked comment above you really should discuss this in the core issue: https://github.com/owncloud/core/issues/6102

danimo commented 10 years ago

https://github.com/owncloud/mirall/pull/2288 is a potential fix. Things to consider before merging:

Note: We will most probably not merge this into 1.7(.0), even though this merge request is against 1.7.

sarusso commented 10 years ago

I would love to contribute, but I am not a php developer and I cannot.

Please do not take me wrong, I am not asking that someone starts to work on this "for me". I am just pointing out an inconsistence: this software just does not do what it claims to do - sync your files. So either you guys fix this or you change the homepage from "With ownCloud you can sync & share your files" to "With ownCloud you can sync & share SOME of your files".

Come on, this behaviour is just ridiculus. People using this software is relying on it also for keeping a backup copy of their files, and then one discovers that it almost silently ignored some of them.

It has been discussed several times also involving @danimo, the server should accept any file name allowed on the clients native filesystems. Then maybe it will not send them to a client not capable of handling them but the server should always accept.

If this cannot be done for technical reasons, and the best thing possible is to accept only file names supported on the server filesystem, I think that it should be stated crystal clear.

Somethign like:

My 2 cents, Stefano.

danimo commented 10 years ago

After checking with http://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words and verifying the results on Linux and OS X, I've changed the patch to allow all these characters.

As I said, the server should be accepting all file names already (pending: check on Windows). That said, we have to be very sure not to break anything. That's why we have been rather conservative about the topic.

FuturePilot commented 10 years ago

If this isn't going to be fixed any time soon, the client needs to give some kind of error message that not all files have been synced. Yes I know you can check the Activity tab in the client but unless you're specifically looking, there is no way to tell something wasn't synced. (almost) Silently ignoring files is unacceptable as the user thinks everything is working correctly.

On a side note, my entire Owncloud environment is Linux so I'm irked that I am being restricted by Windows' limitations.

uvesten commented 10 years ago

Agree with futurePilot, this needs to be more visible for the user!

dragotin commented 10 years ago

We had a more visible notification of that earlier and people complained, so we changed it to the way it is now, which shows the list in the activity tab in the setup page.

muetze-online commented 10 years ago

I turned my back to Windows years ago - and now its got me by the balls :-). I don't want my system to be restricted by the problems of Windows and its filesystem.

axlblue commented 10 years ago

I'm not a programmer. My issue is now 'closed' but how do I fix my problem? Can't the Mac client deal with this?

dragotin commented 9 years ago

Closing this with a link to https://github.com/owncloud/core/wiki/Cross-Platform-File-Handling which describes how we will tackle this problem once we're at it.