haiwen / seafile

High performance file syncing and sharing, with also Markdown WYSIWYG editing, Wiki, file label and other knowledge management features.
http://seafile.com/
Other
12.26k stars 1.54k forks source link

seaf-daemon crashing because of missing password #2662

Closed nicolamori closed 1 year ago

nicolamori commented 1 year ago

Since this morning seaf-daemon segfaults on my system (Archlinux). In my setup I have several libraries, some of them are encrypted. The libraries have been created on the server a couple of years ago, and the client has been set up two months ago with seafile-client 8.0.10 (which I'm still using at the moment). Everything worked flawlessly until this morning; no modification of the seafile installation has been performed in the meantime.

Investigating the crash with gdb I got the following:

Core was generated by `/usr/bin/seaf-daemon -c /home/mori/.ccnet -d /home/mori/Seafile/.seafile-data -'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:76
76              VPCMPEQ (%rdi), %ymm0, %ymm1                                                                                                                                                                                                                                  
[Current thread is 1 (Thread 0x7fe46d5276c0 (LWP 96048))]
(gdb) bt
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:76
#1  0x0000562b36935c4b in seafile_decrypt_repo_enc_key
    (enc_version=2, passwd=passwd@entry=0x0, random_key=0x7fe4581310c0 "<value>", repo_salt=0x0, key_out=key_out@entry=0x7fe46d525c20 "<value>", iv_out=iv_out@entry=0x7fe46d525bf0 "la.mori@fi.infn.it\", \"mtime\": 1376768269, \"name\"787ddc6a83ef11edacfcd83c1a4dd5d2a821c825") at ../common/seafile-crypt.c:239
#2  0x0000562b3694a571 in seaf_repo_fetch_and_checkout (http_task=http_task@entry=0x562b36fdc0b0, remote_head_id=remote_head_id@entry=0x562b36fdc11c "add1215edd063a1caeae54b028566e44b3b8145f") at repo-mgr.c:5853
#3  0x0000562b3692beeb in http_download_thread (vdata=0x562b36fdc0b0) at http-tx-mgr.c:4883
#4  0x0000562b3692467b in job_thread_wrapper (vdata=0x562b36fdc350, unused=<optimized out>) at job-mgr.c:66
#5  0x00007fe4707919a3 in g_thread_pool_thread_proxy (data=<optimized out>) at ../glib/glib/gthreadpool.c:350
#6  0x00007fe47078c315 in g_thread_proxy (data=0x7fe468000d50) at ../glib/glib/gthread.c:831
#7  0x00007fe47045ebb5 in start_thread (arg=<optimized out>) at pthread_create.c:444
#8  0x00007fe4704e0d90 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Inspecting the line 239 of common/seafile-crypt.c I see there's a call to strlen(passwd), which is probably the cause of the segfault since as shown above passwd is NULL and the actual segfault is in __strlen_avx2 inside glibc. I checked ~/Seafile/.seafile-data/repo.db and I see no entries in the RepoPasswd table, but I don't know if this can be related/relevant.

I don't know which other info could be useful, but I can provide it if needed.

nicolamori commented 1 year ago

I have been able to make it work again by removing ~/Seafile/ and re-configuring from scratch (had to re-sync all the libraries, though, so this is more an emergency workaround than a fix). But restoring the old ~/Seafile/ the problem happens again, so I'd say it's something related to a corrupted configuration.

feiniks commented 1 year ago

Hello @nicolamori , can you show the seafile.log output when the crash occurs ?

nicolamori commented 1 year ago

Where can I find it? I can't find it in ~/Seafile/. If you are referring to a server log then I have no access to it, I'm just running a client.

bionade24 commented 1 year ago

@nicolamori Under ~/.ccnet/logs

nicolamori commented 1 year ago

This is an excerpt from the log file during several crash and restart:

[03/30/23 08:01:11] seaf-daemon.c(525): starting seafile client 8.0.10
[03/30/23 08:01:11] seafile-session.c(388): client id = 59dc467f86cd583f2a6ec5feb394fdac39446d7f, client_name = stryke
[03/30/23 08:01:11] socket file exists, delete it anyway
[03/30/23 08:01:11] seaf-daemon.c(553): rpc server started.
[03/30/23 08:01:11] clone-mgr.c(678): Transition clone state for 532dcdd6 from [init] to [check server].
[03/30/23 08:01:11] clone-mgr.c(678): Transition clone state for 532dcdd6 from [check server] to [fetch].
[03/30/23 08:01:11] http-tx-mgr.c(1156): Transfer repo '532dcdd6': ('normal', 'init') --> ('normal', 'check')
[03/30/23 08:01:11] http-tx-mgr.c(1156): Transfer repo '532dcdd6': ('normal', 'check') --> ('normal', 'commit')
[03/30/23 08:01:11] http-tx-mgr.c(1156): Transfer repo '532dcdd6': ('normal', 'commit') --> ('normal', 'fs')
[03/30/23 08:01:12] start to serve on pipe client
[03/30/23 08:01:12] start to serve on pipe client
[03/30/23 08:01:12] start to serve on pipe client
[03/30/23 08:01:12] start to serve on pipe client
[03/30/23 08:01:12] sync-mgr.c(1648): File syncing protocol version on server https://basket.fi.infn.it is 1. Client file syncing protocol version is 2. Use version 1.
[03/30/23 08:01:12] http-tx-mgr.c(1156): Transfer repo '532dcdd6': ('normal', 'fs') --> ('normal', 'data')
[03/30/23 08:01:12] start to serve on pipe client
[03/30/23 08:01:12] start to serve on pipe client
[03/30/23 08:01:15] seaf-daemon.c(525): starting seafile client 8.0.10
[03/30/23 08:01:15] seafile-session.c(388): client id = 59dc467f86cd583f2a6ec5feb394fdac39446d7f, client_name = stryke
[03/30/23 08:01:15] socket file exists, delete it anyway
[03/30/23 08:01:15] seaf-daemon.c(553): rpc server started.
[03/30/23 08:01:15] clone-mgr.c(678): Transition clone state for 532dcdd6 from [init] to [check server].
[03/30/23 08:01:15] clone-mgr.c(678): Transition clone state for 532dcdd6 from [check server] to [fetch].
[03/30/23 08:01:15] http-tx-mgr.c(1156): Transfer repo '532dcdd6': ('normal', 'init') --> ('normal', 'check')
[03/30/23 08:01:15] http-tx-mgr.c(1156): Transfer repo '532dcdd6': ('normal', 'check') --> ('normal', 'commit')
[03/30/23 08:01:15] http-tx-mgr.c(1156): Transfer repo '532dcdd6': ('normal', 'commit') --> ('normal', 'fs')
[03/30/23 08:01:16] start to serve on pipe client
[03/30/23 08:01:16] start to serve on pipe client
[03/30/23 08:01:16] start to serve on pipe client
[03/30/23 08:01:16] start to serve on pipe client
[03/30/23 08:01:16] start to serve on pipe client
[03/30/23 08:01:16] start to serve on pipe client
[03/30/23 08:01:16] http-tx-mgr.c(1156): Transfer repo '532dcdd6': ('normal', 'fs') --> ('normal', 'data')
[03/30/23 08:01:19] seaf-daemon.c(525): starting seafile client 8.0.10
[03/30/23 08:01:19] seafile-session.c(388): client id = 59dc467f86cd583f2a6ec5feb394fdac39446d7f, client_name = stryke
[03/30/23 08:01:19] socket file exists, delete it anyway
[03/30/23 08:01:19] seaf-daemon.c(553): rpc server started.
[03/30/23 08:01:19] clone-mgr.c(678): Transition clone state for 532dcdd6 from [init] to [check server].
[03/30/23 08:01:19] clone-mgr.c(678): Transition clone state for 532dcdd6 from [check server] to [fetch].
[03/30/23 08:01:19] http-tx-mgr.c(1156): Transfer repo '532dcdd6': ('normal', 'init') --> ('normal', 'check')
[03/30/23 08:01:19] http-tx-mgr.c(1156): Transfer repo '532dcdd6': ('normal', 'check') --> ('normal', 'commit')
[03/30/23 08:01:19] http-tx-mgr.c(1156): Transfer repo '532dcdd6': ('normal', 'commit') --> ('normal', 'fs')
[03/30/23 08:01:20] start to serve on pipe client
[03/30/23 08:01:20] start to serve on pipe client
[03/30/23 08:01:20] start to serve on pipe client
[03/30/23 08:01:20] start to serve on pipe client
[03/30/23 08:01:20] start to serve on pipe client
[03/30/23 08:01:20] start to serve on pipe client
[03/30/23 08:01:20] sync-mgr.c(1648): File syncing protocol version on server https://basket.fi.infn.it is 1. Client file syncing protocol version is 2. Use version 1.
[03/30/23 08:01:20] http-tx-mgr.c(1156): Transfer repo '532dcdd6': ('normal', 'fs') --> ('normal', 'data')
feiniks commented 1 year ago

Hi @nicolamori , the crash occurred when the library was downloaded for the first time, can you take a look at the records in theCloneTasks table, in the clone.db database. You can find this table in your old ~/Seafile/.seafile-data/clone.db. I guess there should be no passwd data recorded in this table.

nicolamori commented 1 year ago

Here it is:

$ sqlite3 clone.db 
SQLite version 3.41.2 2023-03-22 11:56:21
Enter ".help" for usage hints.
sqlite> select * from CloneTasks;
532dcdd6-6fe9-4a24-8a34-345030375288|.config|9449e0e613d7c606b5641104be17c4d430e41353||/home/mori/.config||||nicola.mori@fi.infn.it
feiniks commented 1 year ago

Here it is:

$ sqlite3 clone.db 
SQLite version 3.41.2 2023-03-22 11:56:21
Enter ".help" for usage hints.
sqlite> select * from CloneTasks;
532dcdd6-6fe9-4a24-8a34-345030375288|.config|9449e0e613d7c606b5641104be17c4d430e41353||/home/mori/.config||||nicola.mori@fi.infn.it

Hello @nicolamori, there is no password in this table, which will cause this crash. We will add a check for password in next release.

nicolamori commented 1 year ago

@feiniks Ok, thank you. Have you got any idea why this problem suddenly happened? As I wrote, everything worked up to the previous day, and nothing changed in my system (i.e. no upgrades) before the issue came up.

nicolamori commented 1 year ago

@feiniks It just happened again. The password seems to be missing again from the clone.db:

[09:54 mori@stryke ~]$ sqlite3 Seafile/.seafile-data/clone.db 
SQLite version 3.41.2 2023-03-22 11:56:21
Enter ".help" for usage hints.
sqlite> select * from CloneTasks;
532dcdd6-6fe9-4a24-8a34-345030375288|.config|1fa4030dbcce5a33a5be2409ba4be01da154e736||/home/mori/.config||||nicola.mori@fi.infn.it

Is it possible to repair the entry? If yes, can you provide some info about how to do that? Thank you.

Edit: I removed the entry and restarted the client. This fixes the daemon segfault but leaves the library unsynced, so I had to re-sync from scratch, deal with conflicts etc. From inspecting the clone.db during the sync I see that the password must be inserted in clear text in the field after the local path (/home/mori/.config in the above case). This makes me think that the problem stems from the library somehow becoming unsynced and in the middle of a sync. I really don't understand what's happening but I hope this information can be useful for the devs.

feiniks commented 1 year ago

Is it possible to repair the entry? If yes, can you provide some info about how to do that? Thank you.

Edit: I removed the entry and restarted the client. This fixes the daemon segfault but leaves the library unsynced, so I had to re-sync from scratch, deal with conflicts etc. From inspecting the clone.db during the sync I see that the password must be inserted in clear text in the field after the local path (/home/mori/.config in the above case). This makes me think that the problem stems from the library somehow becoming unsynced and in the middle of a sync. I really don't understand what's happening but I hope this information can be useful for the devs.

Hello @nicolamori Deleting the row in the database, then resyncing the library should fix the issue. As for why the password here is empty, I checked the parameters passed in during gui synchronization, but I didn't find the possible reason for this problem. Did you enter the encrypted library password through the gui?

nicolamori commented 1 year ago

@feiniks Yes I do everything through the GUI. Why after deleting the row I had to re-sync from scratch? Yesterday the library was synced and working beautifully, so whatever happens makes me I end up with a non-synced library.

If this might be useful, I disable the automatic sync for the library and manually sync it once per day. I didn't notice if the problem raises because of manual sync, but for sure almost always manual sync simply worked.

nicolamori commented 1 year ago

@feiniks The problem happened again, but this time I paid attention to what was happening. After booting my pc I manually launched a sync for the library. I got a dialog message asking me if I wanted to delete a SFConflict file plus many others (~15k files). I refused, and the library display in the GUI showed something like "Waiting for file deletion confirmation". I checked the local folder and no SFConflict file was present, so I tried again to sync manually. At this point the GUI displayed something like "Downolading files list", the same sentence displayed when syncyng a library from scratch. At the end of the download the daemon started to crash repeatedly, and the entry in CloneTasks with missing password appeared again.

This time I repaired the entry in CloneTasks, and on client restart it began downloading files from remote. I then deleted the resulting SFConflict files, turned off automati sync, and synced manually. Everything fine. I re-started the client to simulate a fresh instance like the one I got after reboot, but this time I didn't get any trouble on manyal sync. Even adding files to the local folder in between the client restart did not trigger the error, so I cannot consistently reproduce.

Hope this helps, I can do other tests if needed. By the way, I have another library that I manually sync, but it never gave this problem; although I always sync it after the troublesome one and so it could just be that the first manually synced triggers the problem, this could point towards a problem with the library itself.

feiniks commented 1 year ago

@nicolamori Thank you for your feedback. I think I know where the problem is. It should be caused by not passing in a password when re-sync the encrypted library during the deletion confirmation process. I will fix this problem in the next release.

nicolamori commented 1 year ago

@feiniks It happened again with 9.0.1. I didn't expect this since you wrote that you would fix the problem in the "next release", and at that time I was on 8.0.10. Maybe the fix wasn't included in 9.0.1? If yes, in which version do you plan to include it?

feiniks commented 1 year ago

The next release is 9.0.2