Owncloud deletes files & tries to sync again (with little success)

Phlos commented 6 years ago

I have a 250 Gb storage allowance through a university system which I use to sync my files between several computers that I work on. The university system supplies a tailored client for windows, but for linux I use OwnCloud. I noticed in the last few days that OwnCloud is deleting files from a local computer and then tries to redownload them. This is an issue I have had several times in the past and somehow keeps popping up. [edit: That it started happening again is possibly related to the fact that I updated to 2.4.2 on 19 July).] I have only experienced this on my linux machines, never with Windows (although it should be noted that I sync only a rather small subset of my stuff to the Windows OS).

Expected behaviour

I have the OwnCloud client on my laptop. It syncs my files. If I am offline for a while, it syncs everything as soon as I'm back online.

Actual behaviour

I have the OwnCloud client on my laptop. It deletes files that I had on the local machine and on the server from the local machine, then tries to redownload them.

The sync for redownloading is not unproblematic:

at some point, it stops downloading and loses the connection, and then sits stuck -- as far as I can see, until I forcefully do something about it. The connection issue is not solved by clicking log out and log in alone (as far as I can see); I have to restart owncloud and then it connects.
at each of the retries, it tells me for a different set of directories that "a new directory > 500MB has been discovered" which I should approve the download of (500 MB is the prompt limit in my client).
This is especially problematic because in general (also when OC is not deleting files) it takes ages to initialise OC after a restart - it takes absolutely forever to work through all the directories and files to check whether any changes have taken place.
Owncloud has, up until now, not realised the new folders that I have created during my time offline.

Consequently, I wait for hours to get a sync that maybe 30 minutes, after which it stops working, I restart everything, etc.

Steps to reproduce

Since this is not a very straightforward issue, I'm afraid there are no straightforward steps either. I did the following, though:

Have owncloud on my laptop (Ubuntu 16.04; currently I have OC 2.4.2)
Work on it without internet for a while.
Connect back to the internet
Discover the file & directory deletion mayhem.

Server configuration

Unfortunately, I do not know what specs the university system has.

Client configuration

Client version: 2.4.2

Operating system: Ubuntu 16.04

OS language: English (United States)

Qt version used by client package (Linux only, see also Settings dialog): Not sure, but the settings dialog says "Built from Git revision d6e975 on Jul 19 10:54:56 using Qt 5.6.2 OpenSSL 1.0.2g 1 Mar 2016"

Client package (From ownCloud or distro) (Linux only): I'm not sure what is meant by this question.

Installation path of client: /opt/ownCloud/

Logs

I had no logs up until now -- using the instructions below I have now started logging what is going on. I'll leave my laptop at work and connected overnight, and come back tomorrow to see what the dawn brings...

Other comments:

This issue might be related to #3102, #6282 and #6322, but since my issue seems to be subtly different (and since I don't understand half of those conversations), I hesitantly file this as a new issue. Apologies if this is not the way forward.

ogoffart commented 6 years ago

It seems that the owncloud client does not see that the file are on the server and thus believe they were removed. This may happen if a bug in the server make the file disapear (for example because of some external storage temporarily not available, and no file is reported to the client instead of an error.) And what is the client you are using on windows? And it does not have this problem?

The blocked connection is another wierd problem. As we normally have timeout to avoid this kind of problem.

Please provide the log as it may contains information about both issues.

Phlos commented 6 years ago

Hi @ogoffart - thanks for the answer. As to your questions:

Client on Windows: binary from https://www.surfdrive.nl/ (by the looks of it it's an owncloud adaptation, but that's as much as I know)
I don't remember ever experiencing any issues there, but as I said, I only have a small subset of my data synced on that OS.
as to the log -- by now it is 14 GB. Oops. I will try to send it to you via a filesender.

Other info

I currently have three systems that are syncing my data: an Ubuntu 16.04 laptop (the problematic system), a Win 10 laptop (actually the same, dualboot -- no problems), and a workstation at my work (Ubuntu 18.04 with owncloud 2.4.1+dfsg-1 -- also no problems).
Observation: since restarting owncloud the last time, no syncing whatsoever has taken place.
Other observation: In the log I grepped "ERROR" and the most peculiar result is the following, with errors at almost-exactly-one-hour intervals since I started logging (which was yesterday at 18:54):

07-30 19:35:49:960 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
07-30 20:35:52:161 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
07-30 21:35:55:285 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
07-30 22:35:57:566 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Host requires authentication"
07-30 23:35:57:202 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
07-31 00:35:59:614 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
07-31 01:36:03:443 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Host requires authentication"
07-31 02:36:03:175 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Host requires authentication"
07-31 03:36:03:032 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
07-31 04:36:04:578 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
07-31 05:36:08:303 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Host requires authentication"
07-31 06:36:08:371 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Host requires authentication"
07-31 07:36:08:096 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
07-31 08:36:09:183 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
07-31 09:36:10:146 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"

ogoffart commented 6 years ago

I only downloaded the first 3.8 GB before it aborted.

Every hour, the sync is aborting because of a 401 error. This is most likely caused by the expiration of the session cookie. And since it appears that the discovery phase taks more than one hours, it never progress.

I recommand you uncheck the new folders that have too many sub-folder so the discovery phase will be quicker and the sync will proceed.

In the log, there are no evidence of files being removed. So the logs can't answer this question.

My suspission is that the server failed to show some files. And following that the client thought it was removed on the server, so it removed the files locally. Then the files re-appears, but there are so many directories that it can't finish the discovery phase without timing out.

What is strange is that the surfdrive client should have had exactly the same behavior.

Also, there should normally not be a 401 as we authenticate every requests.

Phlos commented 6 years ago

Hi @ogoffart - thanks for your analysis. A couple of points:

as to files being deleted - I guess this took place before I started logging...
as to your suggestion, the thing is: there are no "new folders" -- everything was already there, only it was subsequently deleted. In order to uncheck the folders that are causing trouble, I would first have to know which these folders are -- which I don't.
as to the surfdrive client not showing the same behaviour: this may be because on Windows I only sync some directories with documents and no deep trees -- most significantly not my raw data, and it is the raw data that has so many subdirectories and files.

Is there any way to make the discovery phase less painful (i.e. slow)? This is, in general, an issue that might be worth looking into. I am often waiting super long for a sync to show up.

Two more thoughts:

the discovery phase only seems to be causing trouble if the data are already (partially) on the machine. Every time I have connected a new system, it would have no trouble finding the whole directory tree and subsequently downloading it.
I don't think I mentioned this quite clearly, but the trouble only started after I ran a software update (including an update of OC) on 19 July (I now added this to the initial report). While it is an issue that I've had before, I don't remember it from the past months. Could it be an issue that only touches certain versions of OC, or cases where the tree was out of sync before an update, then the update takes place, then the sync? (I'm just speculating here, but this might definitely have been the case)

ogoffart commented 6 years ago

as to your suggestion, the thing is: there are no "new folders" -- everything was already there, only it was subsequently deleted. In order to uncheck the folders that are causing trouble, I would first have to know which these folders are -- which I don't.

They got removed, now they are considered new :-)

Is there any way to make the discovery phase less painful (i.e. slow)? This is, in general, an issue that might be worth looking into. I am often waiting super long for a sync to show up.

Yes, we are working on that. But it won't be there before 2.6 or later.

the discovery phase only seems to be causing trouble if the data are already (partially) on the machine. Every time I have connected a new system, it would have no trouble finding the whole directory tree and subsequently downloading it.

That's interesting. There should normally be no differences.

Do you have selected the option to ask for permission to download new folders of a given size? Can you try to disable it? Maybe this slows a bit the sync as it needs to query the size.

Phlos commented 6 years ago

the discovery phase only seems to be causing trouble if the data are already (partially) on the machine. Every time I have connected a new system, it would have no trouble finding the whole directory tree and subsequently downloading it.

That's interesting. There should normally be no differences.

It could also be that every time I connected a system, it just so happened to be a version that wasn't causing any trouble...? Too little data to tell, really, could be either.

As to your suggestions, I will try again tonight when I get home.

guruz commented 6 years ago

FYI @tomneedham from ownCloud will get in contact with the SURF guys to try to track this down from server side. Meanwhile feel free to investigate more with @ogoffart

Phlos commented 6 years ago

Ah @guruz @tomneedham good to know, I was planning to drop the SURFers a line as well, but then I'll leave that to you.

michaelstingl commented 6 years ago

@Phlos You can ping them as well. Point them to this issue…

Phlos commented 6 years ago

@michaelstingl I e-mailed my institute which seems to be the only way for mere mortals to get in touch with the SURFdrive ppl. Let's see.... ;)

T0mWz commented 6 years ago

Hi @Phlos , Strange problem, you are experiencing this issue since the latest 2.4.2 client version if I understand you correct? Thank you for register this issue, will check on our side what is going on.

Phlos commented 6 years ago

Hi @T0mWz Yes, although the problem has happened before. Not in the past couple of months though. As said above, sth with the server connection is going wrong, it seems (at hourly intervals)...

Phlos commented 6 years ago

Update:

After unchecking the "new" folders and clicking Apply, it froze for a while, but then started looking.
I unchecked the max folder size permission option.
Same behaviour as previously. Every hour, the process terminates, interestingly now around :38 (which seems to be roughly a continuation of the times I saw on 31 July)... @T0mWz could that be the server?

Around every ERROR line, it looks like this:

08-03 00:38:45:587 [ info sync.csync.updater ]: file: work/EMed/EMed_full/SYNTHETICS/event_77/ITERATION_1_4, instruction: INSTRUCTION_NEW <<=
08-03 00:38:45:587 [ info sync.accessmanager ]: 6 "PROPFIND" "https://surfdrive.surf.nl/files/remote.php/dav/files/[deleted by Phlos]@soliscom.uu.nl/work/EMed/EMed_full/SYNTHETICS/event_77/ITERATION_1_4" has X-Request-ID "318d9941-feae-421f-b2b0-a192b2dfa603"
08-03 00:38:45:587 [ info sync.networkjob ]:    OCC::LsColJob created for "https://surfdrive.surf.nl/files" + "/work/EMed/EMed_full/SYNTHETICS/event_77/ITERATION_1_4" "OCC::DiscoverySingleDirectoryJob"
08-03 00:38:45:642 [ warning sync.credentials.http ]:   Stop request: Authentication failed for  "https://surfdrive.surf.nl/files/remote.php/dav/files/[deleted by Phlos]@soliscom.uu.nl/work/EMed/EMed_full/SYNTHETICS/event_77/ITERATION_1_4"
08-03 00:38:45:642 [ warning sync.networkjob ]: QNetworkReply::NetworkError(OperationCanceledError) "Operation canceled" QVariant(Invalid)
08-03 00:38:45:642 [ info gui.account.state ]:  Invalid credentials for "https://surfdrive.surf.nl/files" asking user
08-03 00:38:45:642 [ info gui.account.state ]:  AccountState state change:  "Connected" -> "Asking Credentials"
08-03 00:38:45:678 [ info gui.folder.manager ]: Account "[deleted by Phlos]@soliscom.uu.nl@surfdrive.surf.nl" disconnected or paused, terminating or descheduling sync folders
08-03 00:38:45:678 [ info gui.folder ]: folder  "1"  Terminating!
08-03 00:38:45:678 [ info gui.account.manager ]:        Saving account "https://surfdrive.surf.nl/files"
08-03 00:38:45:678 [ info gui.account.manager ]:        Saving  1  unknown certs.
08-03 00:38:45:687 [ info gui.account.manager ]:        Saving cookies. "/home/Phlos/.local/share/data//ownCloud//cookies0.db"
08-03 00:38:45:687 [ info gui.account.manager ]:        Saved account settings, status: 0
08-03 00:38:45:687 [ info sync.accessmanager ]: 4 "" "https://surfdrive.surf.nl/files/index.php/apps/oauth2/api/v1/token" has X-Request-ID "bb42e3d7-715b-4c2d-954e-a56ad4a900f2"
08-03 00:38:45:687 [ info sync.networkjob ]:    OCC::SimpleNetworkJob created for "https://surfdrive.surf.nl/files" + "" "OCC::Account"
08-03 00:38:45:687 [ info sync.networkjob.lscol ]:      LSCOL of QUrl("https://surfdrive.surf.nl/files/remote.php/dav/files/[deleted by Phlos]@soliscom.uu.nl/work/EMed/EMed_full/SYNTHETICS/event_77/ITERATION_1_4") FINISHED WITH STATUS QNetworkReply::NetworkError(OperationCanceledError) "Operation canceled"
08-03 00:38:45:687 [ warning sync.discovery ]:  LSCOL job error "Operation canceled" 0 QNetworkReply::NetworkError(OperationCanceledError)
08-03 00:38:45:687 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-03 00:38:45:687 [ info sync.engine ]:        Update phase was aborted by user!
08-03 00:38:49:047 [ info sync.database ]:      Closing DB "/home/Phlos/SURFdrive/._sync_8249cb86cd45.db"
08-03 00:38:50:107 [ info sync.engine ]:        CSync run took  3603659 ms
08-03 00:38:50:108 [ info gui.folder ]: Client version 2.4.2 (build 9883)  Qt 5.6.2  SSL  OpenSSL 1.0.2g  1 Mar 2016
08-03 00:38:50:108 [ info gui.folder ]: SyncEngine finished without problem.
08-03 00:38:50:109 [ info gui.folder ]: Folder sync result:  4
08-03 00:38:50:109 [ info gui.application ]:    Sync state changed for folder  "https://surfdrive.surf.nl/files/remote.php/dav/files/[deleted by Phlos]@soliscom.uu.nl/" :  "Success"

Within my directory tree, only directories within my bigger directory "SYNTHETICS" were deleted, while a similarly large directory "DATA" was untouched. Hypothesising upon the cause of the deletion, could it be that if the connection breaks during the discovery phase, everything that has not yet been discovered is somehow branded as "doesn't exist on the server", upon which it is deleted?

T0mWz commented 6 years ago

Hi All,

I checked the logs at our side, to see whats happening here.

At this time there were two clients online, this Linux 2.4.2 client and an another 2.4.1 Linux client which did not do much. You see below that around that time a 401 error occurs for that specific folder. Will check @Phlos her account to see if there is a mismatch here between OC metadata and data on disk. On disk it looks good, but double check is always a good idea.

` - - [03/Aug/2018:00:38:43 +0200] "PROPFIND /files/remote.php/dav/files//work/EMed/EMed_full/SYNTHETICS/event_77/ITERATION_19_4 HTTP/1.1" 207 339496 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:358666 response_size:339496 bytes_received:1078 bytes_sent:340626 W2OH84rL4pSvj14aHf699AAAABI

- - [03/Aug/2018:00:38:44 +0200] "PROPFIND /files/remote.php/dav/files//work/EMed/EMed_full/SYNTHETICS/event_77/ITERATION_19_final HTTP/1.1" 207 661856 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:440227 response_size:661856 bytes_received:1082 bytes_sent:663299 W2OH9IrL4pSvj14aHf699QAAABI - - [03/Aug/2018:00:38:44 +0200] "PROPFIND /files/remote.php/dav/files//work/EMed/EMed_full/SYNTHETICS/event_77/ITERATION_19_final_init_rho HTTP/1.1" 207 669155 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:430606 response_size:669155 bytes_received:1091 bytes_sent:670606 W2OH9IrL4pSvj14aHf699gAAABI - - [03/Aug/2018:00:38:45 +0200] "PROPFIND /files/remote.php/dav/files//work/EMed/EMed_full/SYNTHETICS/event_77/ITERATION_1_4 HTTP/1.1" 401 476 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:30911 response_size:476 bytes_received:1077 bytes_sent:1052 W2OH9YrL4pSvj14aHf699wAAABI - - [03/Aug/2018:00:38:50 +0200] "PROPFIND /files/remote.php/dav/files// HTTP/1.1" 207 501 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:108387 response_size:501 bytes_received:795 bytes_sent:1284 W2OH@mz3SSYPEGckPL3W2wAAAD8 - - [03/Aug/2018:00:38:50 +0200] "GET /files/remote.php/dav/avatars//128.png HTTP/1.1" 404 181 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:96575 response_size:181 bytes_received:615 bytes_sent:667 W2OH@mz3SSYPEGckPL3W3gAAAD8 - - [03/Aug/2018:00:39:01 +0200] "PROPFIND /files/remote.php/dav/files// HTTP/1.1" 207 501 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:115889 response_size:501 bytes_received:795 bytes_sent:1284 W2OIBWgOj45xXDZgu-KULAAAADg - - [03/Aug/2018:00:39:33 +0200] "PROPFIND /files/remote.php/dav/files// HTTP/1.1" 207 501 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:113601 response_size:501 bytes_received:795 bytes_sent:1284 W2OIJWO22vbteyYulfWu5gAAABs` After this 401 error, the client logs in again around 00:38:50 o'clock.

Phlos commented 6 years ago

Hi @T0mWz should I do or check something? Not quite sure from what you write. You write

around that time a 401 error occurs for that specific folder

but it is important to note that it is not the specific folder, but the specific time (every hour) that is causing the trouble. Around the 31st of July it was at every ~[hour]:36 (see my message above), now on the 3rd of August it is at every ~[hour]:39. It seems to shift by 0-5s every time. Whether it is SYNTHETICS/event_77 or any other folder seems to be of no importance. To me, this points to the fact that there is not something wrong with the data, but with the connection. The question is: why is it happening and can we do something about it?

ps, if you want to insert multi-line output you can put it between triple back quotes ``` above and below

T0mWz commented 6 years ago

Hi @Phlos ,

Today I have made a comparison of your real data and the metadata. OwnCloud works on the base of a database containing the metadata of your files, which does not seem to be completely in sync.

I start a command to bring them in sync again, but because you have a lot of files, this will takes a while. Tomorrow, I will check if everything is back in sync, from then we have to see if your client reacts normally again.

T0mWz commented 6 years ago

Hi @Phlos ,

The metadata of your files is back in sync. Would you be able to see if the problems were solved for you?

Phlos commented 6 years ago

@T0mWz No, unfortunately the problem persists :( Hourly errors as always.

@ogoffart also unchecking loads of directories didn't bring any solution.

Now, if I grep for "ERROR" in the log, I get some new error type ("Premature end of document") and the hourly schedule is a bit distorted every time this happens. See below.

08-05 01:58:26:190 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 02:58:30:700 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 03:58:31:047 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 04:58:33:472 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 05:55:02:799 [ warning sync.networkjob.lscol ]:   ERROR "Premature end of document." ""
08-05 05:55:02:801 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 05:58:46:414 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 06:13:10:797 [ warning sync.networkjob.lscol ]:   ERROR "Premature end of document." ""
08-05 06:13:10:798 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 06:14:46:798 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user"
08-05 06:41:58:800 [ warning sync.networkjob.lscol ]:   ERROR "Premature end of document." ""
08-05 06:41:58:802 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 08:03:56:880 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 09:03:57:821 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 10:03:58:520 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 11:04:01:048 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 12:04:01:179 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 13:04:02:385 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 14:04:02:243 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 15:04:03:498 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"

ogoffart commented 6 years ago

"Aborted by the user" is caused by the 401 error. (When we get a 401 we abort the request. We are not supposed to get 401 unless the password is wrong, so we'll let the UI retry a connection in order to show the password dialog)

"Premature end of document." Means that the PROPFIND reply was not a valid XML document, likely the server truncated the reply.

T0mWz commented 6 years ago

Sorry @Phlos , you're right. It happened by coincidence that I saw twice the same file / folder where the 401 error occurred.

@ogoffart , It seems that the client receives a 401 error where the oAuth token may have expired, after which it does a POST for a new oAuth token refresh. See Apache log below;

<IP-user> - - [06/Aug/2018:08:04:18 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/work/EMed/EMed_full/ADJOINT_SOURCES_AND_WINDOWS/ADJOINT_SOURCES/event_58/ITERATION_19_final HTTP/1.1" 207 183548 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:1928315 response_size:183548 bytes_received:1111 bytes_sent:184526 W2fk4iTKOKp6PEB@gKkymgAAABU
<IP-user> - - [06/Aug/2018:08:04:20 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/work/EMed/EMed_full/ADJOINT_SOURCES_AND_WINDOWS/ADJOINT_SOURCES/event_58/ITERATION_19_final_init_rho HTTP/1.1" 401 476 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:65873 response_size:476 bytes_received:1120 bytes_sent:1052 W2fk5CTKOKp6PEB@gKkymwAAABU
<IP-user> - <TOKEN> [06/Aug/2018:09:04:22 +0200] "POST /files/index.php/apps/oauth2/api/v1/token HTTP/1.1" 200 343 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:111611 response_size:343 bytes_received:665 bytes_sent:1352 W2fy9p1fV5OlBGjdBHgpQAAAACU
<IP-user> - - [06/Aug/2018:09:04:23 +0200] "GET /files/status.php HTTP/1.1" 200 162 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:43304 response_size:162 bytes_received:576 bytes_sent:812 W2fy96H@ebmAEXQDGXoeyAAAABg
<IP-user> - - [06/Aug/2018:09:04:23 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/ HTTP/1.1" 207 501 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:190154 response_size:501 bytes_received:799 bytes_sent:1284 W2fy96H@ebmAEXQDGXoeyQAAABg
<IP-user> - - [06/Aug/2018:09:04:23 +0200] "GET /files/ocs/v1.php/cloud/capabilities?format=json HTTP/1.1" 200 1030 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:120993 response_size:1030 bytes_received:629 bytes_sent:1664 W2fy96H@ebmAEXQDGXoeygAAABg
<IP-user> - - [06/Aug/2018:09:04:23 +0200] "GET /files/ocs/v1.php/config?format=json HTTP/1.1" 200 164 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:135195 response_size:164 bytes_received:617 bytes_sent:797 W2fy92dCXu8zmEmaujv1PAAAABU

@Phlos , I see that you have also a sync-client version 2.4.1 running on an another location. Shows that client version also this behavior?

Phlos commented 6 years ago

The 2.4.1 client (which is installed on my main workstation at work) has no issues.

ogoffart commented 6 years ago

So the problem is that the initial discovery is taking more than one hour which is more than the oauth token expiration.

And the sync client can't cope with that, as it has to cancel the sync and refresh the oauth token.

In theory we should refresh the oauth token without restarting the sync. But this is a bit tricky with the current client architecture.

The new discovery algorithm (2.6) will also allow to do the propagation and the discovery at the same time. Which will allow the next sync to restart where we left, without the need to spend one hour re-discovering again.

@ogoffart also unchecking loads of directories didn't bring any solution.

I was hoping it would. Maybe you need to uncheck more. There must be really big directory in your tree that could maybe be unchecked.

The 2.4.1 client (which is installed on my main workstation at work) has no issues.

There are almost no difference with 2.4.2. It could be that the 2.4.1 configuration is not using oauth because it was configured with a password before? It could be that t started syncing when there was less directory on the server so that the discovery did not take more than one hour.

Phlos commented 6 years ago

Hi @ogoffart --

Maybe you need to uncheck more. There must be really big directory in your tree that could maybe be unchecked.

I will try and update you with the results... Now unchecked a 51 GB dir, if that doesn't do the trick I don't know what will :-P -- let's hope the deletion is not futile!

There are almost no difference with 2.4.2.

Hm, interesting.

It could be that the 2.4.1 configuration is not using oauth because it was configured with a password before?

What do you mean by this? Not sure what oauth is.

It could be that t started syncing when there was less directory on the server so that the discovery did not take more than one hour.

There were definitely always the same amount of directories. The only significant difference is that the laptop (the problematic machine w/ 2.4.1) is only used intermittently (i.e. there may be several weeks where I don't use it a lot) whereas the workstation is only turned off briefly every now and then, and is therefore kept mostly in sync. Also, the laptop was probably not in sync at the time that its client was updated to 2.4.2. Can this be related?

T0mWz commented 6 years ago

Hi @ogoffart ,

Thanks for your reply.

For SURFdrive, we have configured multiple back-ends. OwnCloud sync clients starting from 2.4.0 and newer will authenticate through oAuth against SURFdrive. Older clients still using the Shibboleth Single single sign-on method to login.

That is why I would expect the same behavior with the 2.4.1 client. @Phlos, do you synchronize on your other system with the 2.4.1 client also all files or just a selection of folders/files? @ogoffart, could the local sync-client database still play a role?

Phlos commented 6 years ago

@T0mWz

do you synchronize on your other system with the 2.4.1 client also all files or just a selection of folders/files?

My workstation (with OC 2.4.1) syncs everything.

T0mWz commented 6 years ago

Hi @Phlos ,

Oke, good to hear! Maybe a weird idea, but could you try a downgrade from the 2.4.2 client to the older version 2.4.1 to see of the issue is related to this specific client version?

With your 2.4.1 client I see on our side the same reaction, but can not see how the client reacts by a token refresh. But this could help to if the issue is specific related to this newer client version or that we may have to search further..

Thank you very much for your help..! 👍

ogoffart commented 6 years ago

I wouldn't think downgrading to 2.4.1 solves the issue.

The discovery phase is the phase that goes recursively calling PROPFIND on every modified folder on the server. When the client has already synchronized things, this is usually very fast, because most information is cached in the local database. In this case, there are lots of folder with many subfolder which are not cached in the database because they are not synchronized. The client will then do a PROPFIND in each of them before starting the synchronization. But this is taking too long so the synchronization don't even start.

Once the synchronization start (the GET), the files and folder which are synchronized get cached in the database, and so the next sync will be faster.

So the 2.4.1 client must have probably been synchronized before the switch to oauth, or at a time where there was less folder. Allowing the sync to finish and cache everything.

For an unkown reason, but maybe a server problem, large folders got removed client-side on the 2.4.2 client, and it is now unable to recover as it is stuck in the discovery phase.

@Phlos: Can you try to uncheck all the folders which are not locally on your machine, and see if the sync finishes then, after you click apply? Then you can try to check them, one by one. waiting that the sync finishes before checking the next one.

Phlos commented 6 years ago

Yesterday morning, I unchecked the whole 51 GB "SYNTHETICS" directory. The trouble often arose somewhere here in the tree, so I hoped that by removing it, the discovery would reach its end before the hourly cut-off. This removed many of my files, but did not solve the issue. Still, no syncing whatsoever takes place and at hourly intervals, the whole process restarts. The weird thing is that now it gets stuck at a much earlier directory (i.e. one that starts with an A, an assuming it goes through the tree alphabetically). I will uncheck further directories tonight...

Meanwhile: is there any way to get by this hourly cut-off? What is it good for @T0mWz @ogoffart ?

Also, @ogoffart you write:

When the client has already synchronized things, this is usually very fast, because most information is cached in the local database.

I have to say that I very often experience this to be extremely slow, with the search apparently going through all sub-sub-[sub-]*directories, whether altered or not. Could it be that certain directories are simply not cached even though they should be? Where is this local database?

Other question - you say there will be a new discovery algorithm in 2.6. When do you aim to release this version?

michaelstingl commented 6 years ago

there will be a new discovery algorithm in 2.6. When do you aim to release this version?

We don't have release date. We first need to release the 2.4.3 and the 2.5.0. Here you can find the 2.6.0 milestone where you can follow the process: https://github.com/owncloud/client/milestone/47

ogoffart commented 6 years ago

This removed many of my files, but did not solve the issue.

I'm confused... If many files were removed, this means that the sync finished, so the issue was resolved... not? Or you mean that after that, checking it again did not solve the problem? But you need to check directories one by one and wait for them to be sync'ed so the sync can finish in one hour. (Only the newly checked directories should take time)

Meanwhile: is there any way to get by this hourly cut-off?

The one hour cur off is set on the server in AccessToken::EXPIRATION_TIME which is hardcoded to 3600 seconds in https://github.com/owncloud/oauth2/blob/391d81f85ac1c778109f49ba944170d17135cdf9/lib/Db/AccessToken.php#L36

Could it be that certain directories are simply not cached even though they should be?

When the directories are modified on the server, the client need to do PROPFIND. So this happens when there are lots of changes in different directory in the server.

Where is this local database?

In your local directory, there is a hidden file called something like ._sync_25a976ce7d79.db (the number is going to be different)

Other question - you say there will be a new discovery algorithm in 2.6. When do you aim to release this version?

Not 100% sure it will be in 2.6, but quite likely. I'd say 2.6 is about a year ahead.

T0mWz commented 6 years ago

Hi @ogoffart,

Thank you for your time so far!

Currently I'm testing what happens if we increase the oAuth AccessToken lifetime to 4 hours, I hope with this change that we can solve this issue for @Phlos, where the client will get more time to make a file listing of all files. Maybe we should still enable folder by folder, but this will help the hourly timeout.

If my client can handle this normally without problems, I will connect your client (@Phlos) to this specific web server where the timeout is increased.

To be continued.. 😄

Phlos commented 6 years ago

I'm confused... If many files were removed, this means that the sync finished, so the issue was resolved... not?

@ogoffart I meant the unchecking should have caused the client to ignore that whole directory, clearing up much of the mess. I meant that actually much of this directory was on the machine (but it didn't actually remove anything because it never reached the end of the sync run).

The local database, by the way, is 1.8 GB....

Currently I'm testing what happens if we increase the oAuth AccessToken lifetime to 4 hours

@T0mWz Because it's now even slower than it was (getting stuck at an A.../A.... directory) I am actually doubtful that this will solve my problem.

Will it help if I completely disconnect the folder and then reinstate the surfdrive connection? I'm hesitant to do this because I don't want to have to download >200GB worth of data....

T0mWz commented 6 years ago

@Phlos , you're client is now connected to the webserver with a token lifetime of 4 hours. Your client can do a token refresh at any time, from then on we have to see what will happens. I'm hopeful! 😄

Phlos commented 6 years ago

Hi @T0mWz I saw:

08-09 14:55:12:525 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-09 15:55:15:039 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-09 19:55:15:840 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"

but somewhere during the evening/night after this, it logged itself off and wouldn't reconnect automatically. Is this expected behaviour @ogoffart @T0mWz ?

Is there anything else I can do from my side?

T0mWz commented 6 years ago

@Phlos , I saw it already in our logs too. This is a known bug from the 2.4.2 client and fixed in a newer version of the oAuth app (https://github.com/owncloud/oauth2/pull/145). So, see my comment earlier for a downgrade of the OwnCloud client. 😉

But unfortunately this newer version of the oAuth app has another bug in combination with some other apps and our current version of OwnCloud that we are running on SURFdrive, which are fixed in a newer release. However, a full upgrade of the environment is not just an easy think. It's planned for less than two months, but would be nice if we can fix this even faster..

I will think hard in a solution, which can help us now..

guruz commented 6 years ago

This is a known bug from the 2.4.2 client and fixed in a newer version of the oAuth app (owncloud/oauth2#145). So, see my comment earlier for a downgrade of the OwnCloud client. 😉

Note that the oauth2 hotfix is not needed from >= 2.4.3 rc1 .. https://github.com/owncloud/client/commit/235e314da707b1cbdb6af093d260885f424ffd07 https://download.owncloud.com/desktop/testing/

However, a full upgrade of the environment is not just an easy think. It's planned for less than two months, but would be nice if we can fix this even faster..

Please also try to contact ownCloud support in this case.

In general though: What @ogoffart said: The token time needs to be big enough (currently) to account for a whole client discovery (PROPFIND) run of a fresh sync.

Phlos commented 6 years ago

@T0mWz did you decrease the oauth time again? It's ERROR'ing at hourly intervals again. I unchecked my whole directory tree that was causing trouble, then tried re-downloading one of my subdirectories, but it was too big again (I thought it would be OK with the 4 hrs but it cut at 1 hr). [by "big" I mean "many files", actually... i.e. the discovery took >1hr]

Phlos commented 6 years ago

My client has started deleting stuff again. As reported earlier, I had removed the one big directory that was causing trouble, and am in the process of slowly adding back dirs (exactly as @ogoffart had suggested) but still not fully back to normal yet. Now, it removed everything that I'd managed to get back.

@T0mWz can you check the logs what happens around 7:38 CET? At that particular moment, I was not writing to a log file so I can't see what's going on there...

The only thing I have is the .owncloudsync.log file, which says:

# timestamp | duration | file | instruction | dir | modtime | etag | size | fileId | status | errorString | http result code | other size | other modtime | other etag | other fileId | other instruction
#=#=#=# Syncrun started 2018-08-15T04:37:06Z
#=#=#=#=# Propagation starts 2018-08-15T04:38:36Z (last step: 89941 msec, total: 89941 msec)
#=#=#=# Syncrun finished 2018-08-15T04:38:46Z (last step: 9868 msec, total: 99810 msec)
#=#=#=# Syncrun started 2018-08-15T05:37:06Z
#=#=#=#=# Propagation starts 2018-08-15T05:38:45Z (last step: 99364 msec, total: 99364 msec)
||work/website|INST_REMOVE|Down|1526761460||4096||4||0|0|0||||
||work/reviews|INST_REMOVE|Down|1534163017||4096||4||0|0|0||||
||work/manuscripts/manuscript_IC|INST_REMOVE|Down|1534163017||4096||4||0|0|0||||
||work/manuscripts/Agnieszka_paper_2016/version revision|INST_REMOVE|Down|1534163015||4096||4||0|0|0||||
||work/manuscripts/Agnieszka_paper_2016/version 4|INST_REMOVE|Down|1534163014||4096||4||0|0|0||||
[...]

I'm getting pretty desparate here. Is there anything I can do?

T0mWz commented 6 years ago

@Phlos , exactly the same as the previous times. Your oAuth token has expired, then your client will be up and running until a new token succeeds, then a sync starts again. Unfortunately, I can not install a newer version of the oAuth app, otherwise things will go broken then.

You have a few option, where the issue is related to the client version you use.

Downgrade to client versie 2.4.1
Upgrade to latest client version 2.4.3, which available sinds 13/08/2018
Skip you sync client & mount your storage by webdav.

Phlos commented 6 years ago

@T0mWz So to get this straight: at the exact time that the oAuth token expired, it deleted my files? I'm just really puzzled what causes the client to delete files that should be there. And relevantly: why is an oauth expiry sometimes causing this behaviour?

I now did the following:

disconnect my client & then reconnect
upgrade to 2.4.3 (it didn't have an upgrade button in the client itself, so hadn't seen it was out already)

Finally -- what do you mean by "mount your storage by webdav"?

T0mWz commented 6 years ago

Hi @Phlos ,

I see that you have upgraded your client to 2.4.3, the error message from the client is now also gone. I will enroll now the change for a longer oAuth token lifetime, I hope with this change that your client will be able to go better through the large folders.

Do you still experience the same issue now with the newer client version?

Webdav is a direct network connection from your workstation to your storage on SURFdrive, so without the data being synchronized between them. See the dutch explanation on our website; https://www.surfdrive.nl/tutorials#webdav

Phlos commented 6 years ago

Hi @T0mWz @ogoffart the new client 2.4.3 is showing the same behaviour. Everything was deleted this morning at 05:26:06 (after having finished the selective sync that I had set up at 5:22). It only deleted stuff in my directory "work" within my owncloud tree, nothing else. But there it deleted everything.

@T0mWz The error message may have not appeared simply because I was only adding directories in batches such that it would never overrun the oauth token time.

T0mWz commented 6 years ago

Hi @Phlos ,

Annoying that the issue is still going on. I see in the logs on our side, that your client was very busy the whole night with downloading files. Between 03:00 and 05:22 o'clock your client downloaded 135928 files.

However, around 05:26:00: I do not see anything strange in the logs what can explain this behavior. After all files are downloaded, I see that your your client is connecting normal & checks the status every so often.

<CLIENT IP> - - [17/Aug/2018:05:22:13 +0200] "GET /files/remote.php/dav/files/<USERNAME>/work/EMed/EMed_full/ADJOINT_SOURCES_AND_WINDOWS/ADJOINT_SOURCES/event_99/ITERATION_mod_59_smsm_band_100_150s_wins_9f/SJ.BBLS..BHE_2012-07-09T14:02:42.515586Z_2012-07-09T14:08:23.957497Z_cosine_0.05_TimeFrequencyPhaseMisfitFichtner2008 HTTP/1.1" 200 19515 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:236393 response_size:19515 bytes_received:845 bytes_sent:20498 W3Y-Zb5ouGEuxvPE1R-mNgAAAB0
<CLIENT IP> - - [17/Aug/2018:05:22:13 +0200] "GET /files/remote.php/dav/files/<USERNAME>/work/EMed/EMed_full/ADJOINT_SOURCES_AND_WINDOWS/ADJOINT_SOURCES/event_99/ITERATION_mod_59_smsm_band_100_150s_wins_9f/TT.THTN.00.BHE_2012-07-09T14:05:51.398345Z_2012-07-09T14:10:17.241710Z_cosine_0.05_TimeFrequencyPhaseMisfitFichtner2008 HTTP/1.1" 200 19515 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:247391 response_size:19515 bytes_received:847 bytes_sent:20502 W3Y-ZWFo2Xfmuq-bsJu3qwAAAAM
<CLIENT IP> - - [17/Aug/2018:05:22:13 +0200] "GET /files/remote.php/dav/files/<USERNAME>/work/EMed/EMed_full/ADJOINT_SOURCES_AND_WINDOWS/ADJOINT_SOURCES/event_99/ITERATION_mod_59_smsm_band_100_150s_wins_9f/TT.THTN.00.BHN_2012-07-09T14:04:15.735299Z_2012-07-09T14:12:38.663668Z_cosine_0.05_TimeFrequencyPhaseMisfitFichtner2008 HTTP/1.1" 200 19515 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:294741 response_size:19515 bytes_received:847 bytes_sent:20502 W3Y-ZW1EPfjLJnF2ba1alQAAAC4
<CLIENT IP> - - [17/Aug/2018:05:22:13 +0200] "GET /files/remote.php/dav/files/<USERNAME>/work/EMed/EMed_full/ADJOINT_SOURCES_AND_WINDOWS/ADJOINT_SOURCES/event_99/ITERATION_mod_59_smsm_band_100_150s_wins_9f/TT.THTN.00.BHZ_2012-07-09T14:01:05.320223Z_2012-07-09T14:14:01.516029Z_cosine_0.05_TimeFrequencyPhaseMisfitFichtner2008 HTTP/1.1" 200 19515 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:252259 response_size:19515 bytes_received:847 bytes_sent:20502 W3Y-Zak5UsJqdqLITompPQAAACI
<CLIENT IP> - - [17/Aug/2018:05:24:20 +0200] "GET /files/ocs/v2.php/apps/notifications/api/v1/notifications?format=json HTTP/1.1" 200 108 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:163692 response_size:108 bytes_received:651 bytes_sent:781 W3Y-5EgYqb1OjO7vrts8pAAAADg
<CLIENT IP> - - [17/Aug/2018:05:24:20 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/ HTTP/1.1" 207 548 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:235643 response_size:548 bytes_received:833 bytes_sent:1331 W3Y-5NdfSEFoutzWWQ7ogwAAABw
<CLIENT IP> - - [17/Aug/2018:05:24:20 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/ HTTP/1.1" 207 501 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:276900 response_size:501 bytes_received:800 bytes_sent:1284 W3Y-5FHyHwQdPnjhy9YCiAAAACE
<CLIENT IP> - - [17/Aug/2018:05:24:20 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/ HTTP/1.1" 207 500 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:228429 response_size:500 bytes_received:791 bytes_sent:1283 W3Y-5CF@GeorQfMwgmO6rAAAABs
<CLIENT IP> - - [17/Aug/2018:05:24:50 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/ HTTP/1.1" 207 548 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:190129 response_size:548 bytes_received:833 bytes_sent:1331 W3ZAAkgYqb1OjO7vrts8pQAAADg
<CLIENT IP> - - [17/Aug/2018:05:24:51 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/ HTTP/1.1" 207 501 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:131551 response_size:501 bytes_received:800 bytes_sent:1284 W3ZAA0gYqb1OjO7vrts8pgAAADg
<CLIENT IP> - - [17/Aug/2018:05:25:20 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/ HTTP/1.1" 207 548 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:189234 response_size:548 bytes_received:833 bytes_sent:1331 W3ZAIFHyHwQdPnjhy9YCigAAACE
<CLIENT IP> - - [17/Aug/2018:05:25:23 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/ HTTP/1.1" 207 501 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:129403 response_size:501 bytes_received:800 bytes_sent:1284 W3ZAI1HyHwQdPnjhy9YCiwAAACE
<CLIENT IP> - - [17/Aug/2018:05:25:34 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/ HTTP/1.1" 207 14829 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:396194 response_size:14829 bytes_received:1049 bytes_sent:15640 W3ZALtbL2ItCwCIiNDF4bQAAAAc
<CLIENT IP> - - [17/Aug/2018:05:27:02 +0200] "GET /files/ocs/v2.php/apps/notifications/api/v1/notifications?format=json HTTP/1.1" 200 108 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:148573 response_size:108 bytes_received:651 bytes_sent:781 W3ZAhlaQnr27JCbvvK0N9gAAAA0
<CLIENT IP> - - [17/Aug/2018:05:27:02 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/ HTTP/1.1" 207 548 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:186956 response_size:548 bytes_received:833 bytes_sent:1331 W3ZAhlP8RfFLg7qStF8RwwAAADQ
<CLIENT IP> - - [17/Aug/2018:05:27:34 +0200] "GET /files/status.php HTTP/1.1" 200 162 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:41157 response_size:162 bytes_received:577 bytes_sent:812 W3ZApjuU19Yaxy-7kxf8GwAAABQ
<CLIENT IP> - - [17/Aug/2018:05:27:34 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/ HTTP/1.1" 207 501 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:160416 response_size:501 bytes_received:800 bytes_sent:1284 W3ZApjuU19Yaxy-7kxf8HAAAABQ
<CLIENT IP> - - [17/Aug/2018:05:27:35 +0200] "GET /files/ocs/v1.php/cloud/capabilities?format=json HTTP/1.1" 200 1030 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:93976 response_size:1030 bytes_received:630 bytes_sent:1664 W3ZApzuU19Yaxy-7kxf8HQAAABQ
<CLIENT IP> - - [17/Aug/2018:05:27:35 +0200] "GET /files/ocs/v1.php/config?format=json HTTP/1.1" 200 164 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:76731 response_size:164 bytes_received:618 bytes_sent:797 W3ZAp@xdSmcOM0JU88o5EwAAACQ
<CLIENT IP> - - [17/Aug/2018:05:27:35 +0200] "GET /files/ocs/v1.php/cloud/user?format=json HTTP/1.1" 200 199 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:86898 response_size:199 bytes_received:622 bytes_sent:832 W3ZApzuU19Yaxy-7kxf8HgAAABQ
<CLIENT IP> - - [17/Aug/2018:05:27:35 +0200] "GET /files/remote.php/dav/avatars/<USERNAME>/128.png HTTP/1.1" 404 181 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:144223 response_size:181 bytes_received:620 bytes_sent:667 W3ZApzuU19Yaxy-7kxf8HwAAABQ
<CLIENT IP> - - [17/Aug/2018:05:27:35 +0200] "GET /files/ocs/v2.php/apps/notifications/api/v1/notifications?format=json HTTP/1.1" 200 108 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:75032 response_size:108 bytes_received:651 bytes_sent:781 W3ZApzuU19Yaxy-7kxf8IAAAABQ
<CLIENT IP> - - [17/Aug/2018:05:27:35 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/ HTTP/1.1" 207 548 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:172698 response_size:548 bytes_received:833 bytes_sent:1331 W3ZAp@xdSmcOM0JU88o5FAAAACQ

@Phlos Do you have happen any logging of these remove actions? I hope @ogoffart has an idea what undermines this problem and what we can do to solve this?

ogoffart commented 6 years ago

Unless there is a bug in the client, the client deletes files if they are no longer on the server. If the PROPFIND reply is missing files, for example.
We have detection in the client for many things that could go wrong such as incomplete or turncated PROPFIND. So this can only happen if the server really don't report these files. Perhaps they are mounted in some external storage on the server and it is temporarily unavailable?

michaelstingl commented 6 years ago

@Phlos I'd recommend to run the client with the following parameters to be sure you'll catch the next deletion:

--logdir <dir> (each sync run creates a new file)
--logexpire <hours> (client automatically erases saved log data in the directory that is older than the specified number of hours)
maybe --logdebug (increases the verbosity of the generated log files)

Here you can find more information: https://doc.owncloud.com/desktop/latest/troubleshooting.html#saving-files-directly

guruz commented 6 years ago

@T0mWz And you should try 2.5.x not 2.4.3. There are certain sanity checks in 2.5.x that could prevent deletion if server sends semi-bogus PROPFIND reply. https://software.opensuse.org//download.html?project=isv:ownCloud:desktop:daily:2.5&package=owncloud-client thank you :)

Phlos commented 6 years ago

Hi all, Last night, after finally having gotten to the point where I had everything again, a whole directory got deleted and I finally managed to catch it in the log. It seems indeed, like @ogoffart is suggesting, that something that the server is sending is causing my client to think that a whole directory is now empty. Below, I've taken excerpts of that which I think is relevant from the logfile. In addition, I'll send the whole shebang to @ogoffart via a filesender, since the whole log is several gigabytes.

First it seems like a check takes place every 30 seconds:

08-19 00:28:00:239 [ info gui.folder ]: Trying to check "https://surfdrive.surf.nl/files/remote.php/dav/files/[username]@soliscom.uu.nl/" for changes via ETag check. (time since last sync: 7142 s)
08-19 00:28:00:240 [ info sync.accessmanager ]: 6 "PROPFIND" "https://surfdrive.surf.nl/files/remote.php/dav/files/[username]@soliscom.uu.nl/" has X-Request-ID "49edc559-4d46-4cef-af10-d0544a53e9bf"
08-19 00:28:00:240 [ info sync.networkjob ]:    OCC::RequestEtagJob created for "https://surfdrive.surf.nl/files" + "/" "OCC::Folder"
08-19 00:28:00:430 [ info sync.networkjob.etag ]:       Request Etag of QUrl("https://surfdrive.surf.nl/files/remote.php/dav/files/[username]@soliscom.uu.nl/") FINISHED WITH STATUS QNetworkReply::NetworkError(NoError) ""
08-19 00:28:30:239 [ info gui.folder ]: Trying to check "https://surfdrive.surf.nl/files/remote.php/dav/files/[username]@soliscom.uu.nl/" for changes via ETag check. (time since last sync: 7172 s)
08-19 00:28:30:239 [ info sync.accessmanager ]: 6 "PROPFIND" "https://surfdrive.surf.nl/files/remote.php/dav/files/[username]@soliscom.uu.nl/" has X-Request-ID "aeacfd02-4f51-4cdf-b0cd-693cbba97aea"
08-19 00:28:30:240 [ info sync.networkjob ]:    OCC::RequestEtagJob created for "https://surfdrive.surf.nl/files" + "/" "OCC::Folder"
08-19 00:28:30:521 [ info sync.networkjob.etag ]:       Request Etag of QUrl("https://surfdrive.surf.nl/files/remote.php/dav/files/[username]@soliscom.uu.nl/") FINISHED WITH STATUS QNetworkReply::NetworkError(NoError) ""

Then, at 00:29:00, a larger check seems to take place, for whatever reason:

08-19 00:29:00:239 [ info gui.folder.manager ]: Scheduling folder "1" because it has been 7202401 ms  since the last sync
08-19 00:29:00:239 [ info gui.folder.manager ]: Schedule folder  "1"  to sync!
08-19 00:29:00:240 [ info gui.application ]:    Sync state changed for folder  "https://surfdrive.surf.nl/files/remote.php/dav/files/[username]@soliscom.uu.nl/" :  "Not yet Started"
08-19 00:29:00:241 [ info sync.accessmanager ]: 2 "" "https://surfdrive.surf.nl/files/ocs/v2.php/apps/notifications/api/v1/notifications?format=json" has X-Request-ID "9b4514d9-45bb-420a-bf5c-f445f6ea28c5"
08-19 00:29:00:241 [ info sync.networkjob ]:    OCC::JsonApiJob created for "https://surfdrive.surf.nl/files" + "ocs/v2.php/apps/notifications/api/v1/notifications" "OCC::ServerNotificationHandler"
08-19 00:29:00:241 [ info gui.folder.manager ]: Starting the next scheduled sync in 0 seconds
08-19 00:29:00:241 [ info gui.folder ]: Trying to check "https://surfdrive.surf.nl/files/remote.php/dav/files/[username]@soliscom.uu.nl/" for changes via ETag check. (time since last sync: 7202 s)
08-19 00:29:00:242 [ info sync.accessmanager ]: 6 "PROPFIND" "https://surfdrive.surf.nl/files/remote.php/dav/files/[username]@soliscom.uu.nl/" has X-Request-ID "300a5a3a-54d8-4127-a3fc-4f03bb2e05c6"

I don't know what 'folder "1"' is, because I don't have directory of that name. Anyway, after this, the discovery starts:

08-19 00:29:00:244 [ info gui.application ]:    Sync state changed for folder  "https://surfdrive.surf.nl/files/remote.php/dav/files/[username]@soliscom.uu.nl/" :  "SyncPrepare"
08-19 00:29:00:244 [ info gui.folder ]: *** Start syncing  "https://surfdrive.surf.nl/files/remote.php/dav/files/[username]@soliscom.uu.nl/"  - client version 2.4.3 (build 10035)
08-19 00:29:00:245 [ info gui.folder ]: Adding system ignore list to csync: "/etc/ownCloud/sync-exclude.lst"
08-19 00:29:00:246 [ info gui.folder.manager ]: >========== Sync started for folder [SURFdrive] of account [[username]@soliscom.uu.nl@surfdrive.surf.nl] with remote [https://surfdrive.surf.nl/files/remote.php/dav/files/[username]@soliscom.uu.nl/]
08-19 00:29:00:247 [ info sync.engine ]:        There are 53296218112 bytes available at "/home/phlos/SURFdrive/"
08-19 00:29:00:247 [ info sync.engine ]:        Sync with existing sync journal
08-19 00:29:00:247 [ info sync.engine ]:        "Using Qt 5.6.2 SSL library OpenSSL 1.0.2g  1 Mar 2016 on Ubuntu 16.04.5 LTS"
08-19 00:29:00:248 [ info sync.engine ]:        Using Selective Sync
08-19 00:29:00:248 [ info sync.engine ]:        #### Discovery start ####################################################
08-19 00:29:00:248 [ info sync.engine ]:        Server "10.0.3.3" 
08-19 00:29:00:271 [ info sync.csync ]: Memory: 2584868K total size, 1321132K resident, 33232K shared
08-19 00:29:00:271 [ info sync.csync ]: ## Starting local discovery ##
08-19 00:29:00:271 [ info sync.csync.updater ]: Database entry found, compare: 1462549256 <-> 1462549256, etag:  <-> 994fe6a721decb974dcbafadf02fe62f, inode: 10355104 <-> 10355104, size: 11737 <-> 11737, perms: 0 <-> 9f, checksum:  <-> SHA1:a019b067e61392472021225064bc52ea6bfbfdc0 , ignore: 0
08-19 00:29:00:271 [ info sync.csync.updater ]: file: Colorscale.KennethMoreland-CoolWarmFloat257.csv, instruction: INSTRUCTION_NONE <<=
08-19 00:29:00:271 [ info sync.csync.updater ]: Database entry found, compare: 1534329231 <-> 1533287372, etag:  <-> 5b641c1e3ffb6, inode: 10354771 <-> 10354771, size: 20480 <-> 0, perms: 0 <-> fd, checksum:  <->  , ignore: 0
08-19 00:29:00:271 [ info sync.csync.updater ]: file: Literature, instruction: INSTRUCTION_EVAL <<=
   [...]

This goes through the whole directory tree, ending with

08-19 00:29:25:523 [ info sync.csync.updater ]:  <= Closing walk for /home/phlos/SURFdrive with read_from_db 0
08-19 00:29:25:523 [ info sync.csync ]: Memory: 2715940K total size, 1458936K resident, 33232K shared

Then a remote discovery, where it looks like all the directories that are directly in the root directory of the surfdrive are checked on how many files/dirs they contain:

08-19 00:29:25:523 [ info sync.csync ]: ## Starting remote discovery ##
08-19 00:29:25:523 [ info sync.accessmanager ]: 6 "PROPFIND" "https://surfdrive.surf.nl/files/remote.php/dav/files/[username]@soliscom.uu.nl/" has X-Request-ID "f11cbaac-c7b2-4d00-9e81-a6f15a41272d"
08-19 00:29:25:523 [ info sync.networkjob ]:    OCC::LsColJob created for "https://surfdrive.surf.nl/files" + "" "OCC::DiscoverySingleDirectoryJob"
08-19 00:29:26:009 [ info sync.networkjob.lscol ]:      LSCOL of QUrl("https://surfdrive.surf.nl/files/remote.php/dav/files/[username]@soliscom.uu.nl/") FINISHED WITH STATUS QNetworkReply::NetworkError(NoError) ""
   [...]
08-19 00:29:26:031 [ info sync.csync.updater ]: Database entry found, compare: 1533287356 <-> 1533287356, etag: 89bbeda4ca4d7a7704715bd055549659 <-> 89bbeda4ca4d7a7704715bd055549659, inode: 0 <-> 11927721, size: 0 <-> 0, perms: fd <-> fd, checksum:  <->  , ignore: 0
08-19 00:29:26:031 [ info sync.csync.updater ]: Reading from database: Shared
08-19 00:29:26:031 [ info sync.csync.updater ]: file: Shared, instruction: INSTRUCTION_NONE <<=
08-19 00:29:26:032 [ info sync.csync.updater ]: 192 entries read below path Shared from db.
   [...]

At the directory "work", everything goes haywire, because "0 entries read below path work" are read:

08-19 00:29:26:039 [ info sync.csync.updater ]: Reading from database: work
08-19 00:29:26:039 [ info sync.csync.updater ]: file: work, instruction: INSTRUCTION_NONE <<=
08-19 00:29:26:876 [ info sync.csync.updater ]: 0 entries read below path work from db.
08-19 00:29:26:876 [ info sync.csync.updater ]:  <= Closing walk for  with read_from_db 0
08-19 00:29:26:876 [ info sync.csync ]: Memory: 2715940K total size, 1462644K resident, 33232K shared
08-19 00:29:26:876 [ info sync.engine ]:        #### Discovery end ####################################################  26628 ms

After this, everything within "work" is deleted, i.e. millions of files and >100GB of data:

08-19 00:29:26:877 [ info sync.csync.reconciler ]:      INSTRUCTION_REMOVE             client dir:  work/website/doc/_templates
08-19 00:29:26:877 [ info sync.csync.reconciler ]:      INSTRUCTION_REMOVE             client file: work/website/doc/Makefile
   [...]

Sorry if this is all a bit wordy, but I'm not 100% sure which parts of the log are relevant. If indeed the server is sending "semi-bogus", does any of you have any idea what might cause the directory "work" to be listed as "zero files"?

Finally, thank you all for being so willing to help. I really appreciate this.

ckamm commented 6 years ago

@Phlos That log, particularly if it includes the previous sync run and the time in between them would likely be highly useful. It seems the entries below "work/" are missing from the client database - or not read from it for some reason. We definitely want to know why.

And I agree that it sounds similar to #6322 . This comment on disentangling different db-entry-hiding schemes is still applicable: https://github.com/owncloud/client/issues/6322#issuecomment-374885133

EDIT: Since 2.4.2 has info logs on "_invalid_" etags and on file exclusion it very much looks to me like there are no entries below that "work/" path in the database - it's not that they are hidden for some reason.

I wonder what the "Sync Journal cleanup for" entries look like in the preceding log, do they have thousands of entries?

Phlos commented 6 years ago

@ckamm It does indeed include the previous (succesful) sync (I think at 18-08 16:25:36). I already sent the log to @ogoffart and now tried forwarding it to you, but I'm not sure if the e-mail address is correct, so let me know if you received the link.

ckamm commented 6 years ago

@Phlos I got the link and am now downloading it. Thank you!

owncloud / client