owncloud / core

:cloud: ownCloud web server core (Files, DAV, etc.)
https://owncloud.com
GNU Affero General Public License v3.0
8.39k stars 2.05k forks source link

Upgrade error owncloud 9.1.6.2 -> 10.0.3.3 (broken local shares) #29209

Closed pep29 closed 6 years ago

pep29 commented 7 years ago
### Steps to reproduce 1. Dowload and unzip owncloud 10.0.3 2. Copy data and config from owncloud 9.1.6 to ownlouc 10.0.3 3. Upgrade as explained here https://doc.owncloud.org/server/latest/admin_manual/maintenance/upgrade.html ### Expected behaviour should upgrade smoothless ### Actual behaviour It all went well, but then i am stuck at the moving avatars step (keep showing 0%) ### Server configuration **Operating system**: opensuse 13.1 **Web server:** apache2 **Database:** mysql 5.6.12 **PHP version:** 5.6.9 (cli) **ownCloud version:** 10.0.3.3 **Updated from an older ownCloud or fresh install:** 9.1.6.2 **Where did you install ownCloud from:** /srv/www/owncloud **Signing status (ownCloud 9.0 and above):** **List of activated apps:** - comments: 0.3.0 - configreport: 0.1.1 - dav: 0.3.0 - federatedfilesharing: 0.3.1 - federation: 0.1.0 - files: 1.5.1 - files_external: 0.7.1 - files_trashbin: 0.9.1 - files_versions: 1.3.0 - files_videoplayer: 0.9.8 - firstrunwizard: 1.1 - market: 0.2.2 - notifications: 0.3.1 - provisioning_api: 0.5.0 - systemtags: 0.3.0 - updatenotification: 0.2.1 Disabled: - encryption - external - files_antivirus - files_sharing - templateeditor - theme-example - user_external **Are you using external storage, if yes which one:** no **Are you using encryption:** no **Are you using an external user-backend, if yes which one:** no
ghost commented 7 years ago

So closing as a duplicate of #28763 which you have already found.

PVince81 commented 7 years ago

Reopening as there are no federated shares involved, it seems to be related to broken local shares.

PVince81 commented 7 years ago

@pep29 please pick one of the shares that are known to be broken and do a select * from oc_share where file_target like '%NAME%' and replace "NAME" with the name of the share as seen duplicated. So if you see "test" and "test (2)" then replace "NAME" with "test" so we can see both in the results.

pep29 commented 7 years ago

First i will tell you more about the duplicate case:

A creates a directory xxx and share it with B. So B sees the directory xxx in his shared folder, but due to unknow issues to us (maybe during the synchronisation with owncloud client), a copy of xxx is created in the local shared directory of B, at the end B got xxx and xxx(2), only xxx(2) is shows as a shared folder while xxx is not

PVince81 commented 7 years ago

@pep29 is only this one user affected or is there more ?

pep29 commented 7 years ago

@PVince81 There are more and most of them use owncloud client on Mac for the sync

PVince81 commented 7 years ago

@pep29 please run the SQL queries listed in the first post in https://github.com/owncloud/core/pull/28253 and see if any of the affected users appear in the list.

pep29 commented 7 years ago

@PVince81 you mean these one:

select fc.storage,fc.fileid,fc.parent as "wrongparent",fc3.fileid "correctparent",fc.path,fc.etag from oc_filecache fc, oc_filecache fc3 where fc.parent <> -1 and fc.parent != fc3.fileid and fc3.storage=fc.storage and fc3.path = substring(fc.path, 1, length(fc.path) - length(substring_index(fc.path, '/', -1)) - 1) and not exists (select 1 from oc_filecache fc2 where fc2.fileid = fc.parent) order by path;
SELECT fc.`storage`, fc.`fileid`, fc.`path` as `path`, fc.`name`, fcp.`storage` as `parentstorage`, fcp.`path` as `parentpath`     FROM `oc_filecache` fc, `oc_filecache` fcp     WHERE (fc.`parent` = fcp.`fileid`)     AND (         (CONCAT(fcp.`path`, '/', fc.`name`) <> fc.`path`)         OR (fc.`storage` <> fcp.`storage`)     )     AND (fcp.`path` <> '')     AND (fc.`fileid` <> fcp.`fileid`);
select fc.storage,fc.fileid,fc.parent as "wrongparent",fc.path,fc.etag from oc_filecache fc where fc.parent <> -1 and not exists (select 1 from oc_filecache fc2 where fc2.fileid = fc.parent) order by path;
select * from oc_filecache where parent=fileid;
pep29 commented 7 years ago

@PVince81 Query: 1,3,4 return empty.

However query 2 SELECT fc.storage, fc.fileid, fc.pathaspath, fc.name, fcp.storageasparentstorage, fcp.pathasparentpathFROMoc_filecachefc,oc_filecachefcp WHERE (fc.parent= fcp.fileid) AND ( (CONCAT(fcp.path, '/', fc.name) <> fc.path) OR (fc.storage<> fcp.storage) ) AND (fcp.path<> '') AND (fc.fileid<> fcp.fileid);

return some sets, the files with whom we have been having issues as well are show, these folder, files are wrongly coded (we suspected the user created it on their mac and uploaded it using the owncloud client. The special characters on the mac have a different code)

e.g.: 151123_Liste_tats�?chlich_anwesende_TN_JT 16.11.15.pdf but it should be 151123_Liste_tatsächlich_anwesende_TN_JT 16.11.15.pdf

However not all files with those special char are affected, that why we suppected it must be the way mac works

PVince81 commented 7 years ago

@pep29

About encoding:

If using the desktop client, uploading files that are encoded with NFD will be normalized to NFC, so I don't see how it could happen that you have these encoding issues. The weird chars "�?" look more like they are in ISO format, not NFD. NFD umlaut looks like this: https://github.com/owncloud/core/issues/21365#issuecomment-167086634

Now I have no idea why some files would be ISO, unless maybe your server/apache/PHP was not configured properly to use en_US.UTF-8.

That said, I'm not sure whether the encoding is causing the upgrade issue.

About the SQL result:

Since you got results it means there is likely some broken filecache entries that need repairing. The code for this is currently on the stable10 branch and not released yet: https://github.com/owncloud/core/pull/29074. The problem I see here is that you're stuck at the update process, which means the repair should already be done on the previous version, 9.1, which is currently still in the works.

Now there is also no guarantee that fixing these entries would unstuck the upgrade as the problem could still be related to broken shares.

pep29 commented 7 years ago

@PVince81:

i tried to test the change from #29074

But when i tried to repair with the command: occ files:scan --all --repair

i get the error:

PHP Fatal error: Class 'OC\Files\Utils\Scanner' not found in /srv/www/owncloud/apps/files/lib/Command/Scan.php on line 154

PVince81 commented 7 years ago

@pep29 maybe something went wrong with the patching ? Does the file "lib/private/Files/Utils/Scanner.php" still exist ?

For the encoding issue, please check that you have the PHP module "intl" enabled. (ref https://github.com/owncloud/core/issues/29247)

pep29 commented 7 years ago

@PVince81 yes, the file "lib/private/Files/Utils/Scanner.php" still exist

pep29 commented 7 years ago

@PVince81 the "intl" module is enabled

pep29 commented 7 years ago

@PVince81 I also noticed there is an improvment with the above mentioned SQL result, before the upgrade attempt, the query gives 188 rows and after applying the patch, the query gives now 33 rows but then i am stuck using the command occ files:scan --all --repair

PVince81 commented 7 years ago

I'm a bit out of ideas. I don't see why it couldn't find a class that clearly exists, unless you did something wrong when patching. How did you patch ?

pep29 commented 7 years ago

@PVince81 you were right, i indeed did something wrong with the patching. The scan now works as it should, however i still have to disable the files_sharing, in order for the upgrade to success. Otherwise i am still stuck in the above mentioned avatar step

PVince81 commented 7 years ago

How long have you let the upgrade running ? It is likely that if you let it run long enough, the avatar step will finally exit at some point.

A similar issue about long running upgrade was reported before and is still being investigated. It seemed that setting up the FS for a specific user who received many shares somehow run in a lot of unnecessary loops.

PVince81 commented 7 years ago

@pep29 I have built another patch that will add more debug information while running the avatar step, it's here: https://github.com/owncloud/core/pull/29280

Are you able to rerun the upgrade on a test system with this patch applied ? (no need for other patches)

At the time where it hangs, please interrupt the upgrade and have a look at owncloud.log. There should be a lot of message marked with "DEBUG".

The patch logs the "initMountPoint" calls for the current user and also any user who shares with the user currently being processed. It also logs some of the share information. The goal is to find out about potentially broken shares and also get more details about the specific sharing scenario.

I suspect that something is running needlessly into a long loop, setting up the mount points over and over again for some yet unknown reason.

pep29 commented 7 years ago

Ok, i will enable the files_sharing and let the upgrade running and let you know ...

Otherwise, i will ty it again with the patch #29280 and let you know

pep29 commented 7 years ago

So i let it run with the activated FS for 20 minutes, then i canceled

Using the patch #29280 i reran again and found out that i am already stuck with the first user and only found these DEBUG infos:

{"reqId":"w393Rc6g2yt3l5v9mRBU","level":0,"time":"2017-10-18T10:25:52+00:00","remoteAddr":"","user":"--","app":"DEBUG","method":"--","url":"--","message":"moving avatars for user: Bach"} {"reqId":"w393Rc6g2yt3l5v9mRBU","level":0,"time":"2017-10-18T10:25:52+00:00","remoteAddr":"","user":"--","app":"DEBUG","method":"--","url":"--","message":"initMountPonts for user Bach"}

So there is nothing about the share information and any user who shares with the user currently being processed like you mentioned above.

PVince81 commented 7 years ago

Weird... is the user Bach receiving any shares ?

Try select * from oc_share where share_with='Bach'. Then if this user is in any groups, do the same with select * from oc_share where share_with in ('group1', 'group2', ...) replace accordingly. Last time I remember you said there are no external shares at all, so no need to check the oc_share_external table.

Since disabling files_sharing seems to solve the problem for you I expect this one user to have received shares.

pep29 commented 7 years ago

According to the oc_share, the user Bach indeed his receiving somes shares

Here the entry of the first row:

id: xxx share_type: 0 share_with: Bach uid_owner: Yilmes parent: NULL item_type: folder item_source: 18619 item_target: /Shared/18619 file_source: 18619 file_target: /Shared/Year 2014 permissions: 31 stime: 1415011044 accepted: 0 expiration: NULL token: NULL mail_send: 0 uid_initiator: Yilmes share_name: NULL

the user is in a group as well

PVince81 commented 7 years ago

Ok thanks. It is rather strange that despite the fact that this user receives shares, the logging code from the SharedStorage is not called. So this means it hangs at an earlier point, it could be while retrieving the list of received shares. Need to dig deeper into the code...

pep29 commented 7 years ago

The best i could do is to move the DEBUG line in Filesystem.php progressively:

if (self::$debug) { \OCP\Util::writeLog('DEBUG', 'initMountPonts for user ' . $user, \OCP\Util::DEBUG); }

I could still see the debug line in the log file till this part:

if ($userObject) { $mounts = $mountConfigManager->getMountsForUser($userObject); array_walk($mounts, [self::$mounts, 'addMount']); $mounts[] = $homeMount; $mountConfigManager->registerMounts($userObject, $mounts); }

But right after that, not anymore, which indeed means that the last part: self::listenForNewMountProviders($mountConfigManager, $userManager); \OC_Hook::emit('OC_Filesystem', 'post_initMountPoints', ['user' => $user]); is for some reasons never called

pep29 commented 7 years ago

So doing the same in the mentioned if part, it appears, that i am indeed stuck here

$mounts = $mountConfigManager->getMountsForUser($userObject);

PVince81 commented 7 years ago

Let's try this: select * from oc_mounts where user_id="Bach".

Normally this function would gather all mount points of the user, which could be:

So either there is something wrong happening in the SharedMount or SharedStorage, maybe some infinite loop, or something else is wrong in oc_mounts.

If you don't mind redoing the debug dance like you did, you could put some log statements here:

So far I can't think of other locations. Maybe this will provide more clues.

Thanks a lot!

pep29 commented 7 years ago

With the query i got one row:

         id: xxxx
 storage_id: xx
    root_id: xxxxx
    user_id: Bach
mount_point: /Bach/

As for the log statements, i reach all the mentioned three lines.

PVince81 commented 7 years ago

Ok, so far this tells us the following:

The stuckness is likely somewhere in the MountProviderCollection, between the gather of mounts from the share app and the populating of the oc_mounts table.

PVince81 commented 7 years ago

Hmm no, slightly wrong: the population of oc_mounts happens a few lines later in Filesystem::initMountPoints when calling ->registerMounts.

Ok, so it hangs somewhere in the MountProvider then.

Try the following lines now:

Please also tell me how many times the log entries appear each time. In some cases the number needs to match the number of shares the user has. If the number is smaller, it could mean it's stuck on a specific share.

PVince81 commented 7 years ago

@pep29 additionally to the above, I suggest you also install the PHP "Xdebug" module. It usually provides stack traces in the log in case of errors, which can be useful.

If your bug is about an infinite recursion or deep recursion, Xdebug will also put a maximum limit and will crash when it's reached. So if Xdebug crashes the PHP process, it would tell us that there's a recursion and also tell us where. If it does not crash, it means it's an infinite loop, not a recursion.

pep29 commented 7 years ago
pep29 commented 7 years ago

I also noticed that i have a very long list in the 3rd part.

where i ran the scan and repair this morning, i got this at the end: +---------+--------+--------------+ | Folders | Files | Elapsed time | +---------+--------+--------------+ | 32674 | 182231 | 00:57:17 | +---------+--------+--------------+

but with the while-loop in SharedMount checking _fileexists($path), i already have i = 1065562

I guess this is already a problem, right?

pep29 commented 7 years ago

So the problem might be in the ShareMount with the infinite while?

PVince81 commented 7 years ago

but with the while-loop in SharedMount checking file_exists($path), i already have i = 1065562

Ouch... It means you likely already have folder "/Bach (1065562)" somewhere.

Somehow this issue is similar to something we had a long while ago: https://github.com/owncloud/core/issues/25718. From what I remember it was also adding a "(2)". But instead of staying in the loop, it would simply increase the value to "(3)", "(4)" etc every time the FS of this user was setup.

Maybe there is another condition in your env that makes it infinite here.

Can you check the entries received by Bach in oc_share, you said that there were many ? Look at the "file_target" value and see if there are parenthesis. You might also be able to find these by doing select * from oc_share where file_target like '%(%)%';

Once found, do a select * from oc_filecache where name like '%Bach%'. Replace "Bach" with whatever the previous query found. So if the query found "/Bach (2)" then look for "%Bach%" in filecache.

pep29 commented 7 years ago

I am not sure, if i understand, what you mean.

i = 1065562 and increasing, but it does not means that i have that much files, this would also mean, i have more than a million files, which is definitely wrong. Also see the result of the scan above. I can say for sure, the is not folder "/Bach (1065562)".

id: 1529
   share_type: 2
   share_with: Wend
    uid_owner: Griess
       parent: 945
    item_type: folder
  item_source: 82680
  item_target: NULL
  file_source: 82680
  file_target: /Shared/Monitoring (2)
  permissions: 31
        stime: 1444314402
     accepted: 0
   expiration: NULL
        token: NULL
    mail_send: 0
uid_initiator: Griess
   share_name: NULL

I am however not sure if it is alright for item_target to be NULL

pep29 commented 7 years ago
               $i = 2;
        while ($view->file_exists($path) || $mountpointExists($path)) {
            $path = Filesystem::normalizePath($dir . '/' . $name . ' ('.$i.')' . $ext);
            $i++;
        }

Is there another way to write the $path value in the log like i did with $i? Because, when i do the same with $path, i only got an emtpy line in the log.

PVince81 commented 7 years ago

What?? Maybe it means that the $path of this share is empty !

If the path is empty, then it would explain why $view->file_exists($path) always returns true, because that would point to the root/home folder, which always exists.

Ok, it is likely that there is a broken share somewhere that has no target: select * from oc_share where file_target = '' or file_target = '/'. Or someone managed to share a folder that has no name at all which, when computed, results in an empty file target. Let's check: select * from oc_share s, oc_filecache f where s.file_source = f.fileid and f.name = ''.

PVince81 commented 7 years ago

I did a local test where I manually set the "file_target" to "" or "/", and the symptoms are different: the recipient user symply receives a folder called "(2)" in both cases.

PVince81 commented 7 years ago

@pep29 are you using the "share_folder" value in config.php ? If yes, what is it set to ? Asking because SharedMount uses it to compute the target name.

pep29 commented 7 years ago

That is strange somehow, since $i in the while loop increase to a million starting from 2, at some point it ends, because when i log the $path value right before return $path, i indeed get something, which mean i actually get out of the while loop, but why would $i increase that much?

Yes i use "share_folder" value in config.php, it is set to "/Shared"

PVince81 commented 7 years ago

a88

PVince81 commented 7 years ago

The value of "$i" will not increase the number of files. It will only rename the name of the mount point.

To illustrate why this code exists. See the following scenario:

  1. user1 create a local folder "something"
  2. user2 shares a folder called "something" with user1
  3. when user1 logs in, the code in question notices that there is already a folder called "something" in user1's space, so it calles the received folder "something (2)"
  4. user1 now sees two folders: "something" and "something (2)"

The "something (2)" is only the received folder renamed to that. In the case of the increasing $i bug, I expect that instead of seeing "something (2)" the user would see "something (2323492384)", with the value being the last value of $i at the time the loop exits.

At this point I am surprised that the loop even exits at all, given the conditions. If the user "Bach" doesn't have a million files and also does not have a million received shares, we need to find out why the while conditions believe that there is already a million existing files/shares in Bach's home.

pep29 commented 7 years ago

Yes, i understand about the meaning of (2), single each folder has to have a unique name. Somehow it behaves diffently here ...

pep29 commented 7 years ago

By the way, the $path value i got in the log, right before the return $path, are like this "\/Shared\/Admin" which is indeed correct, so this means the upgrade could maybe work, but it would take like a few hours because of the while-loop above?

pep29 commented 7 years ago

Or maybe not, is the while loop called for every user or just one time? Because right after that, i have only three lines in the log for the $path value

PVince81 commented 7 years ago

@pep29 what is the value of "$i" when it exits ?

If the path before return $path has no number appended, then it means that the function that concatenates the path doesn't work. Normally this $path = Filesystem::normalizePath($dir . '/' . $name . ' ('.$i.')' . $ext); should actually add the number behind. If it doesn't then there is something wrong with string concatenation in your PHP version, or something bad happening inside of normalizePath.

The upgrade could maybe work, yes. But if a simple path concatenation is causing trouble, you'll likely have other worse trouble in your OC instance.

while loop called for every user or just one time

It is only called when there are duplicates to resolve. If the resulting $path is still pointing at a duplicate, then next time the user has a FS-related PHP request, it will run the loop again. At this stage I already expect that if this user "Bach" logs in, it will take an hour or so just to login.

PVince81 commented 7 years ago

opensuse 13.1

btw, you should really update this. opensuse 13.1 is EOL since a long time already: https://en.opensuse.org/Lifetime

Since you have PHP 5.6 I think it should still work fine. But that string concatenation issue worries me now.

PVince81 commented 7 years ago

Another thing to try:

diff --git a/apps/files_sharing/lib/SharedMount.php b/apps/files_sharing/lib/SharedMount.php
index 0e2aa4baa3..a52a8a465f 100644
--- a/apps/files_sharing/lib/SharedMount.php
+++ b/apps/files_sharing/lib/SharedMount.php
@@ -128,7 +128,8 @@ class SharedMount extends MountPoint implements MoveableMount {
         * @param SharedMount[] $mountpoints
         * @return mixed
         */
-       private function generateUniqueTarget($path, $view, array $mountpoints) {
+       private function generateUniqueTarget($inputPath, $view, array $mountpoints) {
+               $path = $inputPath;
                $pathinfo = pathinfo($path);
                $ext = (isset($pathinfo['extension'])) ? '.'.$pathinfo['extension'] : '';
                $name = $pathinfo['filename'];

If that doesn't solve the problem, then I suggest to do the debug dance around and into the "normalizedPath" function to find out what is happening and why the path concatenation doesn't seem to work.

pep29 commented 7 years ago

I haven't tried the change yet, but i just found something, it actually works well, but at somepoint i get stuck in the while loop, meaning i see the mentioned three $path value right before the return $path in the log file, and after that the while loop is going crazing, maybe with the fourth folder(?!)