Closed gnohp closed 8 years ago
I think that is the stuff we fixed lately @schiesbn, right?
@gnohp can you re-test this with 9.0.1, we fixed some stuff in this version regarding encryption and versioning.
@schiesbn just finished testing on 9.0.1 and confirmed the issue still exist. In 9.0.1, after reverting to an older version and discovering the file has been corrupted, when I revert back to the latest document version and attempt to download the document, the previous file version is now corrupted as well. No exceptions are thrown in the log.
In 8.2.2, the version becomes corrupted once a revert has taken place too. This is not new to 9.0.1 and I'd assume it exist in 9.0.0.
When a shared user updates a file in 8.2.2 (user has change, edit, and share permissions on the file), the previous version is an empty file. Attempting to revert, an exception is thrown Failed to revert [File Name] to revision [Date] and the spinner remains on the file and version sidebar. This is a nonissue in 9.0.1.
just tried it again with 9.0.1 and can't reproduce it. Can you reproduce this behavior reliable? Does it happen with multiple files or only with some specific files? If it happens only with some files could you prepare a test files and send it to me? (I don't know any reason why it should be file specific but you never know)
Sorry for the delayed response. Just got back from vacation.
Yup the issue is reproducible on Office files with .*x extensions as long as it's been on the server for a few months. I've made new Office files (12 days old now) and those are currently reverting fine. I've also tested with a duplicate copy of a docx file and that works as well (12 days old too).
I was hoping to have a test file when I first reported the issue but I'm not sure if the issue appears after a certain time has passed or if the issue is due to the file being a part of a few upgrades...from late OC7 to OC8-9. I've verified the version was not corrupted since the version downloads and opens fine. It only happens after a user attempts to revert to an older version.
I just tested it again going from 7.0 to 9.0. Always created new versions in between and tried to download them. It always worked. I could never reproduced the described behavior...
@schiesbn did you try reverting to the old version or downloading? Maybe it's time related? I'll see if I can gather more information.
@gnohp I tried both downloading and reverting the version. Also Versions don't have their own encryption keys. All versions and the original file always use the same key.
And here is the diff:
1744,1764c1744,1745
< 0072040 \0 \0 332 \0 \0 \0 023 \0 \0 \0 \0 \0 \0 \0 \0 \0
< 0072060 \0 \0 \0 \0 q b \0 \0 c u s t o m X m
< 0072100 l / i t e m 1 . x m l P K 001 002 -
< 0072120 \0 024 \0 006 \0 \b \0 \0 \0 ! \0 4 - r 271 C
< 0072140 001 \0 \0 F 002 \0 \0 021 \0 \0 \0 \0 \0 \0 \0 \0
< 0072160 \0 \0 \0 \0 \0 V c \0 \0 d o c P r o p
< 0072200 s / c o r e . x m l P K 001 002 - \0
< 0072220 024 \0 006 \0 \b \0 \0 \0 ! \0 H 264 \f \n 201 003
< 0072240 \0 \0 . 021 \0 \0 022 \0 \0 \0 \0 \0 \0 \0 \0 \0
< 0072260 \0 \0 \0 \0 371 e \0 \0 w o r d / f o n
< 0072300 t T a b l e . x m l P K 001 002 - \0
< 0072320 024 \0 006 \0 \b \0 \0 \0 ! \0 375 Y 025 8 024 003
< 0072340 \0 \0 035 + \0 \0 024 \0 \0 \0 \0 \0 \0 \0 \0 \0
< 0072360 \0 \0 \0 \0 252 i \0 \0 w o r d / w e b
< 0072400 S e t t i n g s . x m l P K 001 002
< 0072420 - \0 024 \0 006 \0 \b \0 \0 \0 ! \0 \ 326 302 Y
< 0072440 271 001 \0 \0 223 003 \0 \0 020 \0 \0 \0 \0 \0 \0 \0
< 0072460 \0 \0 \0 \0 \0 \0 360 l \0 \0 d o c P r o
< 0072500 p s / a p p . x m l P K 005 006 \0 \0
< 0072520 \0 \0 025 \0 025 \0 K 005 \0 \0 377 o \0 \0 \0 \0
< 0072540
---
> 0072040 \0 \0 332 \0 \0 \0
> 0072046
Looks like the files are different.
@PVince81 It's the same file. test2.txt is missing the tail of the file...line 1744 (0072040) onward.
Hmmm, indeed. Can you tell us the exact byte-size of the original and broken file ?
Maybe it's missing the last block for some reason. There are some loops in the encryption code and maybe it "missed a beat".
Ok, the size can be inferred from the hex dump: test1.txt: 72540 bytes test2.txt: 72046 bytes
Ok, so the second file is missing 494 bytes
On v9.0.1 I tried generating a file with this exact size:
dd if=/dev/urandom of=randombytes.dat bs=72540 count=1
Then I followed the steps above. When you said "modify the file" I generated a new file of the same size.
After revert+download, the md5/contents of the file is still correct.
Any reason why you added "Wait 2 - 5 minutes" ? Does the restoration take that long ? W If it did, maybe it internally ran into a PHP timeout and the file got truncated. Was the file from the original report shared with lots of people ? (which might explain the slowness)
Just tired doing the same steps but with a share recipient, still works fine.
Also reported by @kenyang001 here https://github.com/owncloud/core/issues/25211
Strangely it's also about office files + encryption + versions + restore.
@gnohp @kenyang001 are you able to repeat the issue by uploading the original file, then the changed file (with the same changes) ?
It could be due to a PHP timeout, but considering that the files in question aren't big, it is less likely. Or the encryption code is having trouble with specific files with specific contents.
@owncloud/encryption
@PVince81 No clue about what is happening here. The size of the truncated file does not correspond to an integer multiple of blocksize (encrypted or unencrypted), so I wouldn't suspect the encryption wrapper in the first place. @gnohp Have these files been produced through an interface (web browser/webdav client), or have they been read from primary storage (i.e. the data folder of the owncloud server)? In case the primary storage is not corrupted, I would suspect something happening at the level of the file metadata. If the (unencrypted) size is not set correctly in the metadata when manipulating file versions, file corruption will happen upon unencryption.
@jknockaert according to the original steps the file was overwritten using drag and drop in the web UI.
Good point about the size in the filecache. However I think that even if the size there is wrong, it is likely that either it's the encrypted or unencrypted size. Looking at the corrupted hex dumps above, a piece of the end of the file is missing. If the size was wrong in the database, it is unlikely that this size is smaller than the actual file. Usually the unencrypted size would be stored in the DB and in some buggy cases we had the encrypted size there instead, which itself is bigger (not smaller).
@PVince81 Ja for the file creation and manipulation it was done through the UI. But for the files posted here (test1.txt and test2.txt) it has not been specified how they were produced. It may well be that they are consistent in the data store, but that upon reproduction through the UI they fail. If the unencrypted filesize in the filecache is wrong, corruption happens upon decryption. The size could be smaller e.g. when a more recent version was smaller, and upon reverting to the older version the filecache was not updated correctly. But I'm seeing this very much from an encryption-wrapper point-of-view. I'm far less familiar with the filecache mechanisms.
@gnohp do you still have the corrupted file somewhere in ownCloud, in the list of versions ? If yes, can you do select * from oc_filecache where name like '%nameofcorruptfile%'
? There should be entries from the "files" folder but also from "files_versions". Goal is to find out whether the size in the "size" column is correct.
@kenyang001 can you do the same for your file ?
Well, thinking of it, it may even be that when the (wrong) unencrypted size is too big in the filecache corruption will happen. It probably would result in an error after a timeout. The user doesn't see this error; he/she just gets a corrupt file (with the error message going to the owncloud log). It would also explain the indications for a timeout that we had earlier in this discussion.
@PVince81 Hi PVince81, the size is as below the size in files folder: -rw-r--r-- 1 www-data www-data 27K Jun 20 14:56 bug-test2.xlsx the size in files_versions folder : -rw-r--r-- 1 www-data www-data 499K Jun 22 17:02 bug-test2.xlsx.v1466405851
if the file is sharing+ encryption+restore+versions, then the corrupted file will happen. actually, we also get file corrupt wen when not doing restore, but not sure other conditions that causes the same symptoms. thanks.
@kenyang001 can you also check the sizes of these files in the "oc_filecache" table ? The sizes you see on the filesystem is the encrypted size. In the database you should see smaller sizes.
I think I'm onto something. Did some tests on my 9.1.0beta2 server with ODS files and versions.
My steps:
At this stage I get download failures in Firefox or "truncated" messages in cadaver.
Surprisingly the size of the restored file in the file cache is zero!
Edit: I misread, the zero was from the keys folder
+--------+------------------------------------------------------------------------------+-------+----------------------------------+
| fileid | path | size | etag |
+--------+------------------------------------------------------------------------------+-------+----------------------------------+
| 92567 | files/test/test.ods | 40073 | e14742a1b21d631c2a364349f21093f7 |
| 92563 | files_encryption/keys/files/test/test.ods | 0 | 576a54e8445ce |
| 92564 | files_encryption/keys/files/test/test.ods/OC_DEFAULT_MODULE | 0 | 576a54e845298 |
| 92565 | files_encryption/keys/files/test/test.ods/OC_DEFAULT_MODULE/fileKey | 32 | ddc73e64b590fdb9f806cb701c531c3f |
| 92566 | files_encryption/keys/files/test/test.ods/OC_DEFAULT_MODULE/vincent.shareKey | 512 | bc5e2e71e4f3c8e52a7cae798acce0fd |
| 92569 | files_versions/test/test.ods.v1466586344 | 32687 | 69aaaa8cff01db19fbc9b7a9472bfa29 |
| 92570 | files_versions/test/test.ods.v1466586439 | 40073 | 91112bb539db19aed29a80464af376f1 |
+--------+------------------------------------------------------------------------------+-------+----------------------------------+
However on disk the encrypted size is 52352, so the file is indeed there but the filecache has the wrong size.
Not sure if this is at all related to this ticket or a different issue. I'll try and see if I can redo this with test files I could share publicly.
Okay I was wrong with the zero size. Still, my OC doesn't seem to want to deliver this file for download.
Aha, got it: after restoring the 32kb file, the oc_filecache entry should have been set to 32687 but it still is 40073.
I suspect that if the file's size had reduced, then the size would be smaller and a truncated file would be delivered instead.
So the bug is that when restoring a file, the "size" value of the restored file isn't properly copied.
Will do more tests to confirm this.
Confirmed on master / 9.1.0beta2:
MariaDB [owncloud]> select fileid, path, size, etag from oc_filecache where path like '%test.txt%';
+--------+-----------------------------------------------------------------------+------+----------------------------------+
| fileid | path | size | etag |
+--------+-----------------------------------------------------------------------+------+----------------------------------+
| 47 | files_encryption/keys/files/test.txt | -1 | 576a58e038660 |
| 48 | files_encryption/keys/files/test.txt/OC_DEFAULT_MODULE | -1 | 576a58e039763 |
| 49 | files/test.txt | 8 | dc5e0fd4e261dbd51f6b86175aa0c0eb |
| 50 | files_encryption/keys/files/test.txt/OC_DEFAULT_MODULE/fileKey | 32 | d5626c25b28c5a50cee50b50da03114e |
| 51 | files_encryption/keys/files/test.txt/OC_DEFAULT_MODULE/admin.shareKey | 512 | 38bfdbd8b4528ce4d33fbaf29dbaad69 |
| 55 | files_versions/test.txt.v1466587363 | 3 | e73985d4b8546ecc58a4d8214304cbf8 |
| 63 | files_versions/test.txt.v1466587394 | 8 | 68bb09fb7e723aeef67a5afcbddfca83 |
+--------+-----------------------------------------------------------------------+------+----------------------------------+
Expected: file size of "test.txt" is also restored to 3
Actual result: file size of "test.txt" is still 8 and this will mess with file downloads
The size copy issue only happens when encryption is enabled.
Next up: check which other versions are affected, 9.0 and maybe 8.2.
I'll have a look to fix this.
For the encryption wrapper to function reliably it's absolutely crucial that the (unencrypted) size is set right. There's some mechanisms to catch improperly set sizes and correct them, but in my opinion that's band aid. Fixing the filecache handling will solve issues like this as well as any other algorithm that depends on it.
Just did one pass with the debugger and I saw no code that would update the size of the original file after restoration. The contents of the file is copied with "$storage->moveFromStorage" which doesn't update any cache data.
Now I'm wondering how this could work before, will check older versions. Else the quickfix is to add an additional cache update there.
CC @icewind1991 in case you have an idea
Just checked stable8.2 and it is affected too. I haven't checked stable9 and it's very likely that it's affected too.
And stable8.1 is affected too... that's the version that brought the new encryption modules code.
Will now check if there was a previous version in the 8.1.x series that worked before (hoping for a bisect).
@PVince81 What are you looking for? The corruption seems to be in the filecache handling rather than anywhere else.
@jknockaert I'm trying to find whether this is a regression, whether some commit has broken it.
It turns out that it has never worked before, v8.1.0 is also affected.
Now to write a fix for this. I think we need to add code to update the filecache entry after the copy/move of the old version was done.
@PVince81 Agree, I don't expect this to be a regression. The band aid I was referring to is here: https://github.com/owncloud/core/blob/master/lib/private/Files/Storage/Wrapper/Encryption.php#L484 So it would suffice to set unencrypted size to -1 upon rollback; then the encryption wrapper will fix it upon subsequent access.
Here's a fix in the versions code: https://github.com/owncloud/core/pull/25225
However I'm not sure whether it's the correct fix. I observed that without encryption the size was already copied properly, so this means there is another code path that works. From what I remember the encryption code does magic with the size field so maybe that magic bypasses the regular code.
@jknockaert I just saw your comment after submitting the fix, maybe you're onto something to provide a better fix.
@jknockaert the band-aid might not be enought though, I think 8.1 doesn't have the size repairing fix
I could try and analyze which code path is taken for the non-encrypted restore vs encrypted restore, but I'm not sure it's worth investing the time if my PR here https://github.com/owncloud/core/pull/25225 already works.
@PVince81 Your approach is better if it works, as it prevents inefficient recalculating the unencrypted size. The band-aid was backported to 8.1 and everything since; so it could be used if your solution doesn't work.
Ok, I found the reason why it works non-encrypted: at the end of the restore, we do a touch()
on the restored file, which somehow updates its size, possibly with a scanFile call.
However in the case of encryption, the wrapper prevents relying on the on-disk size of the file so it doesn't update it. That looks like it.
So this confirms that https://github.com/owncloud/core/pull/25225 is the best fix here.
Ok, sorted out and efficient fix implemented. Great!
@gnohp @kenyang001 to manually repair the "corrupt" files, set their size to -1 do this:
select * from oc_filecache where path like 'files/%test.txt%'
(replace test.txt with the file name)update oc_filecache set size=-1 where fileid=$fileid
, replace $fileid with the file id from aboveThis will tell ownCloud to ignore the file size on download and it will automatically redetect the correct size after one download attempt.
The files themselves are not really corrupt, it is only the download that doesn't deliver the correct amount of bytes.
@felixboehm FYI in case file corruption has been observed after restoring a version in OC >= 8.1
@PVince81 Hi Pvince81, thanks for your information. I'd like to know if next owncloud verion will fix this problem? and there are a few corrupted files for our users, do we need manually fix all of them by your method, or can they be fixed automatically when we upgrade to the next owncloud version. please advise. thanks a lot.
@kenyang001 9.0.3 was released and contains a fix that prevents this issue to happen again.
However the existing affected files will stay broken, you can need to fix them manually with https://github.com/owncloud/core/issues/23928#issuecomment-227741431
@PVince81 Should there be a procedure somewhere to reset file sizes? Either by the user for specific files, or by the admin for specific users? The latter could be added to the occ script.
@jknockaert the problem is how to identify the broken files ? One would need to decrypt all of them to find out whether the decrypted size mismatches the stored size. This cannot be done over the web UI as it would run into timeouts.
As for an occ
command, we cannot decrypt anything without the user's password (or recovery key).
@PVince81 I meant setting the size to -1 in the database. After that the existing mechanisms will recalculate the size. But it may well be that when a folder size is calculated (triggering recalculation of all file sizes) a timeout is hit.
@jknockaert in this case, since we can't identify which exact file is broken, maybe would need to set the size of ALL files of that specific user to -1 and let the size auto-adjust during the next download.
Steps to reproduce
I was able to reproduce the file corruption consistently when the file has existed in ownCloud for a few months but inconsistently when a few days old. This only seems to occur on .*x file types. Older format and other types (png, txt, etc.) have no issues.
Expected behaviour
File opens to the reverted version
Actual behaviour
Error dialog:
Excel: We Found a problem with some content in '...xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.
Word: We're sorry. We can't open ... because we found a problem with its contents
Workaround
Server configuration
Operating system: CentOS 6.6
Web server: Apache
Database: Galera
PHP version: 5.6
ownCloud version: (see ownCloud admin page) 8.2.2 and 9.0
Updated from an older ownCloud or fresh install: Yes 8.2.2 was from 8.2.1 9.0 was from 8.2.3
Where did you install ownCloud from: repo
Are you using external storage, if yes which one: local/smb/sftp/... no
Are you using encryption: yes/no yes
Are you using an external user-backend, if yes which one: LDAP/ActiveDirectory/Webdav/... yes
Client configuration
Browser: Chrome
Operating system: Windows 10
Logs
Web server error log
ownCloud log (data/owncloud.log)