Closed flamingm0e closed 1 year ago
@flamingm0e I could not reproduce the behaviour. Updates from 3.0.0-rc.3
to 3.0.0
work fine.
I can however recreate the same problem when downgrading from 3.0.0
to 3.0.0-rc3
. Reason for this is a breaking change in the metadata backend. See release notes:
Note
The metadata store in the DecomposedFS has changed
When you upgrade from 2.0.0 to 3.0.0-rc.1 or later and if you didn't set
OCIS_DECOMPOSEDFS_METADATA_BACKEND
manually, ocis will change the storage of the file metadata from using extended attributes (xattrs
) to messagepack (messagepack
).This decision was made because extended attributes are limited and have some issues using shared filesystems. Messagepack is a straightforward binary format.
at least in my test
3.0.0-rc3
still uses xattrs, while3.0.0
is using messagepack. I can see errors like"xattr.get /.ocis/storage/users/spaces/dd/dfd866-7656-4601-86da-9146131faaae/nodes/dd/df/d8/66/-7656-4601-86da-9146131faaae user.ocis.name: no data available"
in the ocis logs after the downgrade.
Maybe you have a similar issue? Do you have envvar OCIS_DECOMPOSEDFS_METADATA_BACKEND
set in your enviroment?
Those errors do exist in my logs.
I do not have any of the 3 envvars that are listed in the upgrade documentation. I guess that since rc3 was still using xattrs, I need to go through the same steps as if I were upgrading from 2.0 to 3.0?
Shouldn't be necessary if you had a running 3.0.0-rc.3
. But I'm wondering why it failed on the upgrade for you.
Could you try running 3.0.0
with OCIS_DECOMPOSEDFS_METADATA_BACKEND="messagepack"
?
Tested with instructions from upgrade documentation, and with OCIS_DECOMPOSEDFS_METADATA_BACKEND="messagepack"
and same problem. When trying to login from the app on my Android device, it never authorizes, and says "it was not found". The logs on server state
{"level":"error","service":"idp","error":"ldap identifier backend get user error: user does not exist or too many entries returned","time":"2023-06-14T14:44:13.172535732Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/log/logrus_wrapper.go:50","message":"IdentifierIdentityManager: fetch failed to get user from userID"}
But I can login from web browser. Logging into web browser, I have no personal files, no shares, and no spaces available.
{"level":"error","service":"proxy","error":"gateway: grpc failed with code CODE_INTERNAL","time":"2023-06-14T14:44:15.115376744Z","line":"github.com/owncloud/ocis/v2/services/proxy/pkg/middleware/create_home.go:74","message":"error when calling Createhome"}
WOW.
Reverted back to rc3 again....now EVERYTHING is gone, except my users. FML. I guess I get to restore user files from my RCLONE jobs now.
You need to change OCIS_DECOMPOSEDFS_METADATA_BACKEND
to xattrs if you want to use rc3
again. My guess is that something went wrong in metadata migration.
You can also try running 3.0.0
with xattrs backend
and now it won't start.
Which version didn't start? And in which configuration?
The upgrade documentation strongly recommends to do a full backup first, which is neessary to avoid data loss when a revert is needed. Partial reverting is not allowed nor possible. Could you do a full restore from backup, follow the upgrade guide and tell where the issue occurs?
I'm rolling back my zfs snapshot to last known good working configuration. I will begin testing from there.
Rolling back to a ZFS snapshot has gotten me back to normal.
I will snapshot and proceed to try another upgrade, utilizing the ENVVARS as discussed.
:+1: Probably something went wrong during xattrs->messagepack migration. It happens automatically when you start 3.0.0
for the first time. Keep an eye on the logs maybe it tells us what went wrong the first time.
Same errors exist in the logs on the upgrade. Same problems with missing everything on user and Spaces.
At this point, it's probably better to just start over. This is frustrating.
Which error are you talking about? What happens on initial start of 3.0.0
? You should see a log like
"Migrating to messagepack metadata backend..."
probably followed by an error.
Which error are you talking about? What happens on initial start of
3.0.0
? You should see a log like"Migrating to messagepack metadata backend..."
probably followed by an error.
The same errors I previously mentioned:
https://github.com/owncloud/ocis/issues/6518#issuecomment-1591371900
I never saw anything about messagepack, but that could be because as soon as it started, it started the other 2 errors and spammed everything.
I never saw anything about messagepack, but that could be because as soon as it started, it started the other 2 errors and spammed everything.
Strange. It shouldn't try to get an user when nobody tries to login.
{"level":"error","service":"idp","error":"ldap identifier backend get user error: user does not exist or too many entries returned","time":"2023-06-14T14:44:13.172535732Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/log/logrus_wrapper.go:50","message":"IdentifierIdentityManager: fetch failed to get user from userID"}
This is the important error, the other one is just a follow-up. Does it start to throw this error directly when you start ocis? I mean before you login?
It is likely throwing the error immediately because I have an RCLONE webdav connection to my server, and I have the app on my phone, so multiple devices trying to login immediately when it comes back online. I would have to pull my devices off, wife, kid, etc, so that nothing is trying to login to quiet the logs. It's a huge coordinated effort, sadly. I'll see if I can find some time to run through the scenarios this weekend.
All I wanted was a self hosted Google Drive replacement...and I love the simplicity of configuring OCIS (and the speed without the overhead of all the different components), but man, when it fails, it fails.
I see. This could be the problem. If someone logs in during xattrs -> messagepack migration it could potentially break the respective user.
I'll try to reproduce this tomorrow...
I may have time tomorrow as well to try and get everyone logged out and try again.
I could not reproduce so far by spamming the server with requests. But I still think something goes wrong during migration because I can reproduce exact same behaviour by just skipping migration.
If you don't want to log out all your users you can try running ocis with OCIS_RUN_SERVICES: "storage-users,nats"
. This will only start storage-users
service (which does the migration). Your clients still can't connect as proxy is not running. Maybe we can see migration logs then, they should tell us what went wrong.
If there are no errors, remove the envvar and restart ocis. If the problem is still there it was not the messagepack migration.
OK. I configured that ENVVAR and started it up (after doing a snapshot with OCIS off, of course)
I finally see the migrating to messagepack message.
{"level":"info","root":"/var/lib/ocis/storage/users","time":"2023-06-16T12:21:49.441395153Z","caller":"github.com/cs3org/reva/v2@v2.14.0/pkg/storage/utils/decomposedfs/migrator/0003_switch_to_messagepack_metadata.go:45","message":"Migrating to messagepack metadata backend..."}
That seems like a good sign that I can actually see that, and not the errors spamming the logs. That's helpful, thank you.
How long would this process take? I only have a couple hundred GB of data in there, but lots of small files. Should I see a message when completed that it's done?
Yes there should be a log like
{"level":"info","time":"2023-06-16T12:30:05.267986656Z","caller":"github.com/cs3org/reva/v2@v2.14.0/pkg/storage/utils/decomposedfs/migrator/0003_switch_to_messagepack_metadata.go:106","message":"done."}`
Unfortunately I can't give you an ETA. But the size of the files doesn't matter. Only the amount is relevant as it needs to rewrite metadata for each.
perfect. Thank you.
I will wait and see what happens.
Thank you for all your assistance.
It appears the cause of my problems was that I didn't know when migration was working, or complete. After over 5 hours, it finally finished, I removed that ENVVAR and fired it back up normally. Everything is as it should be at this time. I thought I was going to have to start over.
@flamingm0e thanks for your input. I will asap file an update to our upgrade guide with the info provided.
Addon, mind to tell how many files have been affected for upgrading?
I have a ton of files. I don't know how many. Is there a quick way to figure that out?
Describe the bug
Upgraded from 3.0.0-rc3 to 3.0.0 to see if a bug was fixed that caused multiple folders to show up after creating one.
After running the new docker container with same settings as before, my personal user has lost access to all personal files, and all Spaces have been deleted.
Steps to reproduce
Steps to reproduce the behavior:
Expected behavior
I expect all files to still be present
Actual behavior
All Spaces were deleted, and users have no access to files.
Setup
I am using basic OCIS docker config behind Caddy 2
Additional context
Rolling back to rc3 allows me to access my user files again, but all Spaces are still missing.