stratis-storage / stratisd

Easy to use local storage management for Linux.
https://stratis-storage.github.io
Mozilla Public License 2.0
794 stars 55 forks source link

`stratis-min pool start` fails with EBADF #3586

Closed mvollmer closed 5 months ago

mvollmer commented 5 months ago

Version: stratisd-3.6.6-1.fc40.x86_64

# stratis-min pool start --unlock-method keyring --prompt 595d5961-76cd-47ef-8515-eb256620a0e8
Enter passphrase followed by return: <RET>
Error: "Nix error: EBADF: Bad file number"

Strace says:

socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 3
connect(3, {sa_family=AF_UNIX, sun_path="/run/stratisd/stratisd-min-jsonrpc"}, 37) = 0
pipe2([4, 5], 0)                        = 0
write(5, "", 0)                         = 0
close(5)                                = 0
close(4)                                = 0
sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="{\"PoolStart\":[{\"Uuid\":\"595d5961-"..., iov_len=73}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[4]}], msg_controllen=24, msg_flags=0}, 0) = -1 EBADF (Bad file descriptor)
close(3)                                = 0
write(2, "Error: ", 7Error: )                  = 7
write(2, "\"", 1")                       = 1
write(2, "Nix error: EBADF: Bad file numbe"..., 33Nix error: EBADF: Bad file number) = 33

It looks like stratis-min tries to send fd 4 over the socket to stratisd-min, but fd 4 was closed just prior to sendmsg.

mvollmer commented 5 months ago

I don't think this needs anything special to reproduce. Any encrypted pool with a passphrase should exhibit this behavior. I found it when stratis-min failed like this during boot when called from stratis-fstab-setup.

mvollmer commented 5 months ago

I could also reproduce this after boot, by stopping stratisd.service and then running stratisd-min and stratis-min from the command line.

mulkieran commented 5 months ago

Smoking gun has got to be here:

+++ b/src/jsonrpc/client/key.rs
@@ -25,7 +25,7 @@ pub fn key_set(key_desc: KeyDescription, keyfile_path: Option<
&str>) -> StratisR

             let (read_end, write_end) = pipe()?;
             write(write_end, password.as_bytes())?;
-            do_request!(KeySet, key_desc; read_end)
+            do_request!(KeySet, key_desc; read_end.as_raw_fd())
         }
     };
     if rc != 0 {

and here:

--- a/src/jsonrpc/client/pool.rs
+++ b/src/jsonrpc/client/pool.rs
@@ -2,7 +2,7 @@
 // License, v. 2.0. If a copy of the MPL was not distributed with this
 // file, You can obtain one at http://mozilla.org/MPL/2.0/.

-use std::path::PathBuf;
+use std::{os::fd::AsRawFd, path::PathBuf};

 use nix::unistd::{pipe, write};

@@ -37,7 +37,7 @@ pub fn pool_start(
         do_request_standard!(PoolStart, id, unlock_method; {
             let (read_end, write_end) = pipe()?;
             write(write_end, password.as_bytes())?;
-            read_end
+            read_end.as_raw_fd()
         })
     } else {
         do_request_standard!(PoolStart, id, unlock_method)

Actually, I believe the first change is benign because the pipe is created before the macro is expanded.

mvollmer commented 5 months ago
-            do_request!(KeySet, key_desc; read_end)
+            do_request!(KeySet, key_desc; read_end.as_raw_fd())

Just thinking out loud, I have no idea what's actually going on: Does this now leak the file descriptor? I.e., where is it closed now? But maybe this doesn't matter in a one-shot process like stratis-min. It will exit very soon anyway.

mulkieran commented 5 months ago
-            do_request!(KeySet, key_desc; read_end)
+            do_request!(KeySet, key_desc; read_end.as_raw_fd())

Just thinking out loud, I have no idea what's actually going on: Does this now leak the file descriptor? I.e., where is it closed now? But maybe this doesn't matter in a one-shot process like stratis-min. It will exit very soon anyway.

The do_request_standard! macros final "argument" is the entire expression below.

let (read_end, write_end) = pipe()?;
             write(write_end, password.as_bytes())?;
            read_end

When the macro is expanded, that whole bit of syntax is evaluated and used in the call. That worked, when read_end was the value of the expression, because Rust lifetime rules prevented drop from being called on the file descriptor while it was in use. But when nix went to requiring a raw file descriptor instead, i.e., just a number, the lifetime rules allowed the file to be dropped while that number was still in use.