Errors related to personality retrieval with Yama

HorlogeSkynet commented 6 months ago

Hello there :wave:

I've noticed literally thousands of errors related to personality retrieval, very likely due to Yama.

It looks like (lxcfs logs) :

lxcfs[XXXX]: ../src/proc_fuse.c: 94: get_procfile_size_with_personality: Failed to get caller process (pid: YYYY) personality

As Yama also reports unauthorized (prevented) accesses, it actually floods kernel logs as well :

kernel: ptrace attach of "REDACTED PROCESS NAME"[YYYY] was attempted by "/usr/bin/lxcfs /var/lib/lxcfs"[XXXX]

When we dig this a bit, we can read in proc(5) :

/proc/[pid]/personality (since Linux 2.6.28)
  This  read-only  file  exposes  the process's execution domain, as set by personality(2).
  The value is displayed in hexadecimal notation.

  Permission to access this file is governed by a ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2)

I wanted to propose a patch but I'm not sure about the best approach here. Should we :

revert #609 ?
handle EPERM specifically so as to ignoring it ? (this wouldn't address Yama complaining about "ptrace attachment tentative")
check whether Yama is enforced through /proc/sys/kernel/yama/ptrace_scope before even trying to get task personality (LBYL) ?
something else ?

Thanks for your time, bye :pray:

Setup : lxcfs 6.0.0 / Linux 6.8.4

stgraber commented 6 months ago

How about setting up YAMA so lxcfs is allowed the action?

All of the other options will cause incorrect cpuinfo data for some users so don't seem adequate.

This seems like an issue with the LSM configuration on the system rather than anything wrong with lxcfs.

stgraber commented 6 months ago

Are you getting the issue above with the stock configuration of a major Linux distribution?

HorlogeSkynet commented 6 months ago

The thing is Yama configuration is system-wide and not "per-process", so it has to be a global relax for a specific need (unless maybe we deal with setcap and capabilities, leveraging fs extended attributes).

Indeed I've got a hardened ptrace.yama_scope sysctl, which explains why I am one of the firsts to encounter this issue.

Do you think we should mention this somewhere in documentation ? Or as a regression in v6 changelog ?

Although I'd like to emphasize I think something should be handled in code, as :

it is not needed for most of lxcfs usages (and was not present until then), optional requirement ?
Linux proc documentation explicitly reads access to personality may be blocked according to ptrace policy.

Thanks for your time, bye 👋

stgraber commented 6 months ago

This change isn't a new feature, it's a bugfix. It's needed on any platform that's capable of running both 64bit and 32bit code so we don't incorrectly report the personality data to the reader.

stgraber commented 6 months ago

We could refuse reads to /proc/cpuinfo with a permission error on systems where we get an error reading the personality data, but that would effectively break all containers on such systems.

HorlogeSkynet commented 6 months ago

Hello Stéphane,

Little sum up of recent investigations (I hope my tests where exhaustive) :

Only (current) maximum Yama ptrace scope (3, "no-attach") breaks task personalities retrieval ;
A relax to 2 stop logs from being flooded (echo "kernel.yama.ptrace_scope = 2" >> /etc/sysctl.d/99-zlxcfs.conf and reboot) ;
I've come up anyway with a patch (excuse my rusty C, I'm not sure about the static variable for "cache"), I'd be happy to submit as a PR if it suits you (which prevents reading personalities if ptrace policy disallows it, not tested) :

diff --git a/src/proc_fuse.c b/src/proc_fuse.c
index 1049e72..2f6fc01 100644
--- a/src/proc_fuse.c
+++ b/src/proc_fuse.c
@@ -145,7 +145,7 @@ __lxcfs_fuse_ops int proc_getattr(const char *path, struct stat *sb)
        strcmp(path, "/proc/swaps")     == 0 ||
        strcmp(path, "/proc/loadavg")   == 0 ||
        strcmp(path, "/proc/slabinfo")  == 0) {
-       if (liblxcfs_functional())
+       if (liblxcfs_functional() && is_ptrace_allowed())
            sb->st_size = get_procfile_size_with_personality(path);
        else
            sb->st_size = get_procfile_size(path);
@@ -206,7 +206,7 @@ __lxcfs_fuse_ops int proc_open(const char *path, struct fuse_file_info *fi)

    info->type = type;

-   if (liblxcfs_functional())
+   if (liblxcfs_functional() && is_ptrace_allowed())
        info->buflen = get_procfile_size_with_personality(path) + BUF_RESERVE_SIZE;
    else
        info->buflen = get_procfile_size(path) + BUF_RESERVE_SIZE;
@@ -1646,7 +1646,7 @@ __lxcfs_fuse_ops int proc_read(const char *path, char *buf, size_t size,
        return read_file_fuse_with_offset(LXC_TYPE_PROC_MEMINFO_PATH,
                          buf, size, offset, f);
    case LXC_TYPE_PROC_CPUINFO:
-       if (liblxcfs_functional())
+       if (liblxcfs_functional() && is_ptrace_allowed())
            return proc_read_with_personality(&proc_cpuinfo_read, buf, size, offset, fi);

        return read_file_fuse_with_offset(LXC_TYPE_PROC_CPUINFO_PATH,
diff --git a/src/utils.c b/src/utils.c
index ab665f7..1fc9a68 100644
--- a/src/utils.c
+++ b/src/utils.c
@@ -691,3 +691,44 @@ int get_task_personality(pid_t pid, __u32 *personality)

    return ret;
 }
+
+/*
+   This function checks Yama ptrace scope to make sure system security policy allows ptrace usage.
+   This is required as accessing task personalities (see `get_task_personality` above) may be restricted by a ptrace
+   access mode check (see PROC(5)).
+*/
+bool is_ptrace_allowed(void)
+{
+   static int yama_ptrace_scope = -1;
+
+   __u32 buf_scope;
+   __do_close int fd = -EBADF;
+   int ret = -1;
+   char buf[8 + 1];
+
+   /* Yama ptrace scope is not yet known (cache is empty) */
+   if (yama_ptrace_scope == -1) {
+       /* assume default policy if we can't retrieve info below */
+       yama_ptrace_scope = YAMA_PTRACE_SCOPE_RESTRICTED;
+
+       fd = open(SYS_FS_KERNEL_YAMA_PTRACE_SCOPE, O_RDONLY | O_CLOEXEC);
+       if (fd >= 0) {
+           ret = read_nointr(fd, buf, sizeof(buf) - 1);
+           if (ret >= 0) {
+               buf[ret] = '\0';
+               if (safe_uint32(buf, &buf_scope, 16) < 0)
+                   return log_error(true, "Failed to read Yama ptrace scope %s", buf);
+
+               if (buf_scope >= YAMA_PTRACE_SCOPE_NOATTACH)
+                   lxcfs_error("Due to Yama ptrace policy, reading /proc files from containers may be inconsistent");
+               yama_ptrace_scope = buf_scope;
+           }
+       } else if (errno == -ENOENT) {
+           /* in this very case, Yama may not be enabled at all */
+           yama_ptrace_scope = YAMA_PTRACE_SCOPE_CLASSIC;
+       }
+   }
+
+   /* consider ptrace is allowed when Yama scope is lower than "no-attach" mode */
+   return (yama_ptrace_scope < YAMA_PTRACE_SCOPE_NOATTACH);
+}
diff --git a/src/utils.h b/src/utils.h
index 7ed021a..0be0480 100644
--- a/src/utils.h
+++ b/src/utils.h
@@ -25,6 +25,11 @@
 #define SEND_CREDS_NOTSK 1
 #define SEND_CREDS_FAIL 2

+#define SYS_FS_KERNEL_YAMA_PTRACE_SCOPE "/sys/kernel/yama/ptrace_scope"
+#define YAMA_PTRACE_SCOPE_CLASSIC 0
+#define YAMA_PTRACE_SCOPE_RESTRICTED 1
+#define YAMA_PTRACE_SCOPE_NOATTACH 3
+
 struct file_info;

 __attribute__((__format__(__printf__, 4, 5))) extern char *must_strcat(char **src, size_t *sz, size_t *asz, const char *format, ...);
@@ -77,6 +82,7 @@ static inline bool file_exists(const char *f)
 extern char *read_file_at(int dfd, const char *fnam, unsigned int o_flags);

 extern int get_task_personality(pid_t pid, __u32 *personality);
+extern bool is_ptrace_allowed(void);
 extern int get_host_personality(__u32 *personality);

 #endif /* __LXCFS_UTILS_H */

Thanks for your time, bye :pray:

stgraber commented 6 months ago

Unless I missed something, your patch will cause cpuinfo to not be virtialized on such systems, which will then cause the guest to still get the cpuinfo for the incorrect architecture as lxcfs will perform the read of the host file under its own architecture.

So far, the only safe solution I can think of is to either make LXCFS refuse to start on such systems or have any attempt at reading cpuinfo on such systems to result in an error.

HorlogeSkynet commented 6 months ago

Indeed, I have opted out for a "warning-like" message approach with :

+               if (buf_scope >= YAMA_PTRACE_SCOPE_NOATTACH)
+                   lxcfs_error("Due to Yama ptrace policy, reading /proc files from containers may be inconsistent");

Let me adapt patch according to your second proposal (which I prefer) and discuss it over a PR :ok_hand:

stgraber commented 6 months ago

It won't be inconsistent, it will be wrong :)

@brauner @mihalicyn what do you think? Should we print a warning/error that nobody will ever see and return bad data that can break/corrupt builds run inside the container or should we just return an error (EIO or whatever) whenever someone attempts to read cpuinfo or any other file requiring the process personality and add a mention to the README that getting such an error means the system is likely running YAMA or another LSM in a way that's incompatible with LXCFS.

hallyn commented 6 months ago

Indeed it's simply incompatible. Returning an error seems best. EIO or maybe EACCESS, as this is an access control issue.

mihalicyn commented 6 months ago

Hi @HorlogeSkynet

Thanks for your report and PR!

Just curious, why you are using kernel.yama.ptrace_scope = 3 mode? It looks too restrictive, even root user can't ptrace other tasks when this mode is enabled. While kernel.yama.ptrace_scope = 2 only allows tracing for privileged user in the task's user namespace.

I agree with Serge, we just need to return -EIO / -EACCES.

HorlogeSkynet commented 6 months ago

@hallyn thanks for your feedback, I've replaced ENOTSUP by EACCES.

@mihalicyn I guess "hardening, defense in depth, ..." could be an answer here.

@ three of you : I noticed in "personality change" code that nothing is done if "personality restoration" actually fails. It means that lxcfs would continue to run with host personality if somehow first call to personality succeeds, but second fails. I don't have an overall idea of the (security ?) consequences here, but I don't think regularly pursuing program workflow (including emitting an error log) is fine. WDYT ?

Thanks for your time ! Bye 👋

stgraber commented 6 months ago

Your branch seems overly complicated when all we really want to do is catch the failure to read the personality file and then return EACCES back to the reader.

We don't really need to know that we're dealing with YAMA or some other LSM similarly messing with us, we just want an I/O error to hit the reader and a message in the LXCFS README.md file to mention that YAMA when set to 3 will break the file.

HorlogeSkynet commented 6 months ago

I agree, but as I originally stated, if we don't perform the check beforehand, lxcfs keeps triggering Yama detection of ptrace usages (cf. kernel log line).

stgraber commented 6 months ago

I'm fine with that, at the end of the day, we need the system's administrator to notice that something isn't working properly and then decide to either rollback the YAMA setting or uninstall LXCFS.

HorlogeSkynet commented 6 months ago

Would you accept another (simple) LBYL approach if we check (once) for /proc/self/personality access ? I noticed it fails when ptrace is fully-disabled, and the check wouldn't be "Yama-specific".

EDIT : plus "rollback the Yama setting [...]" isn't anodyne as it requires a system reboot :roll_eyes:

EDIT 2 : It appears /proc/self/personality is always accessible, whereas /proc/$$/personality is not with a ptrace scope of 3 (for instance in a Bash command with a subprocess).
I opted for init personality access check instead in #638 (to prevent forking off a child for this).

stgraber commented 6 months ago

Yeah, that should be fine.

lxc / lxcfs

Errors related to personality retrieval with Yama #636