Closed HorlogeSkynet closed 6 months ago
How about setting up YAMA so lxcfs is allowed the action?
All of the other options will cause incorrect cpuinfo data for some users so don't seem adequate.
This seems like an issue with the LSM configuration on the system rather than anything wrong with lxcfs.
Are you getting the issue above with the stock configuration of a major Linux distribution?
The thing is Yama configuration is system-wide and not "per-process", so it has to be a global relax for a specific need (unless maybe we deal with setcap
and capabilities, leveraging fs extended attributes).
Indeed I've got a hardened ptrace.yama_scope
sysctl, which explains why I am one of the firsts to encounter this issue.
Do you think we should mention this somewhere in documentation ? Or as a regression in v6 changelog ?
Although I'd like to emphasize I think something should be handled in code, as :
proc
documentation explicitly reads access to personality may be blocked according to ptrace policy.Thanks for your time, bye 👋
This change isn't a new feature, it's a bugfix. It's needed on any platform that's capable of running both 64bit and 32bit code so we don't incorrectly report the personality data to the reader.
We could refuse reads to /proc/cpuinfo with a permission error on systems where we get an error reading the personality data, but that would effectively break all containers on such systems.
Hello Stéphane,
Little sum up of recent investigations (I hope my tests where exhaustive) :
3
, "no-attach") breaks task personalities retrieval ;2
stop logs from being flooded (echo "kernel.yama.ptrace_scope = 2" >> /etc/sysctl.d/99-zlxcfs.conf
and reboot) ;static
variable for "cache"), I'd be happy to submit as a PR if it suits you (which prevents reading personalities if ptrace policy disallows it, not tested) :diff --git a/src/proc_fuse.c b/src/proc_fuse.c
index 1049e72..2f6fc01 100644
--- a/src/proc_fuse.c
+++ b/src/proc_fuse.c
@@ -145,7 +145,7 @@ __lxcfs_fuse_ops int proc_getattr(const char *path, struct stat *sb)
strcmp(path, "/proc/swaps") == 0 ||
strcmp(path, "/proc/loadavg") == 0 ||
strcmp(path, "/proc/slabinfo") == 0) {
- if (liblxcfs_functional())
+ if (liblxcfs_functional() && is_ptrace_allowed())
sb->st_size = get_procfile_size_with_personality(path);
else
sb->st_size = get_procfile_size(path);
@@ -206,7 +206,7 @@ __lxcfs_fuse_ops int proc_open(const char *path, struct fuse_file_info *fi)
info->type = type;
- if (liblxcfs_functional())
+ if (liblxcfs_functional() && is_ptrace_allowed())
info->buflen = get_procfile_size_with_personality(path) + BUF_RESERVE_SIZE;
else
info->buflen = get_procfile_size(path) + BUF_RESERVE_SIZE;
@@ -1646,7 +1646,7 @@ __lxcfs_fuse_ops int proc_read(const char *path, char *buf, size_t size,
return read_file_fuse_with_offset(LXC_TYPE_PROC_MEMINFO_PATH,
buf, size, offset, f);
case LXC_TYPE_PROC_CPUINFO:
- if (liblxcfs_functional())
+ if (liblxcfs_functional() && is_ptrace_allowed())
return proc_read_with_personality(&proc_cpuinfo_read, buf, size, offset, fi);
return read_file_fuse_with_offset(LXC_TYPE_PROC_CPUINFO_PATH,
diff --git a/src/utils.c b/src/utils.c
index ab665f7..1fc9a68 100644
--- a/src/utils.c
+++ b/src/utils.c
@@ -691,3 +691,44 @@ int get_task_personality(pid_t pid, __u32 *personality)
return ret;
}
+
+/*
+ This function checks Yama ptrace scope to make sure system security policy allows ptrace usage.
+ This is required as accessing task personalities (see `get_task_personality` above) may be restricted by a ptrace
+ access mode check (see PROC(5)).
+*/
+bool is_ptrace_allowed(void)
+{
+ static int yama_ptrace_scope = -1;
+
+ __u32 buf_scope;
+ __do_close int fd = -EBADF;
+ int ret = -1;
+ char buf[8 + 1];
+
+ /* Yama ptrace scope is not yet known (cache is empty) */
+ if (yama_ptrace_scope == -1) {
+ /* assume default policy if we can't retrieve info below */
+ yama_ptrace_scope = YAMA_PTRACE_SCOPE_RESTRICTED;
+
+ fd = open(SYS_FS_KERNEL_YAMA_PTRACE_SCOPE, O_RDONLY | O_CLOEXEC);
+ if (fd >= 0) {
+ ret = read_nointr(fd, buf, sizeof(buf) - 1);
+ if (ret >= 0) {
+ buf[ret] = '\0';
+ if (safe_uint32(buf, &buf_scope, 16) < 0)
+ return log_error(true, "Failed to read Yama ptrace scope %s", buf);
+
+ if (buf_scope >= YAMA_PTRACE_SCOPE_NOATTACH)
+ lxcfs_error("Due to Yama ptrace policy, reading /proc files from containers may be inconsistent");
+ yama_ptrace_scope = buf_scope;
+ }
+ } else if (errno == -ENOENT) {
+ /* in this very case, Yama may not be enabled at all */
+ yama_ptrace_scope = YAMA_PTRACE_SCOPE_CLASSIC;
+ }
+ }
+
+ /* consider ptrace is allowed when Yama scope is lower than "no-attach" mode */
+ return (yama_ptrace_scope < YAMA_PTRACE_SCOPE_NOATTACH);
+}
diff --git a/src/utils.h b/src/utils.h
index 7ed021a..0be0480 100644
--- a/src/utils.h
+++ b/src/utils.h
@@ -25,6 +25,11 @@
#define SEND_CREDS_NOTSK 1
#define SEND_CREDS_FAIL 2
+#define SYS_FS_KERNEL_YAMA_PTRACE_SCOPE "/sys/kernel/yama/ptrace_scope"
+#define YAMA_PTRACE_SCOPE_CLASSIC 0
+#define YAMA_PTRACE_SCOPE_RESTRICTED 1
+#define YAMA_PTRACE_SCOPE_NOATTACH 3
+
struct file_info;
__attribute__((__format__(__printf__, 4, 5))) extern char *must_strcat(char **src, size_t *sz, size_t *asz, const char *format, ...);
@@ -77,6 +82,7 @@ static inline bool file_exists(const char *f)
extern char *read_file_at(int dfd, const char *fnam, unsigned int o_flags);
extern int get_task_personality(pid_t pid, __u32 *personality);
+extern bool is_ptrace_allowed(void);
extern int get_host_personality(__u32 *personality);
#endif /* __LXCFS_UTILS_H */
Thanks for your time, bye :pray:
Unless I missed something, your patch will cause cpuinfo to not be virtialized on such systems, which will then cause the guest to still get the cpuinfo for the incorrect architecture as lxcfs will perform the read of the host file under its own architecture.
So far, the only safe solution I can think of is to either make LXCFS refuse to start on such systems or have any attempt at reading cpuinfo on such systems to result in an error.
Indeed, I have opted out for a "warning-like" message approach with :
+ if (buf_scope >= YAMA_PTRACE_SCOPE_NOATTACH)
+ lxcfs_error("Due to Yama ptrace policy, reading /proc files from containers may be inconsistent");
Let me adapt patch according to your second proposal (which I prefer) and discuss it over a PR :ok_hand:
It won't be inconsistent, it will be wrong :)
@brauner @mihalicyn what do you think? Should we print a warning/error that nobody will ever see and return bad data that can break/corrupt builds run inside the container or should we just return an error (EIO or whatever) whenever someone attempts to read cpuinfo
or any other file requiring the process personality and add a mention to the README that getting such an error means the system is likely running YAMA or another LSM in a way that's incompatible with LXCFS.
Indeed it's simply incompatible. Returning an error seems best. EIO or maybe EACCESS, as this is an access control issue.
Hi @HorlogeSkynet
Thanks for your report and PR!
Just curious, why you are using kernel.yama.ptrace_scope = 3
mode? It looks too restrictive, even root user can't ptrace other tasks when this mode is enabled. While kernel.yama.ptrace_scope = 2
only allows tracing for privileged user in the task's user namespace.
I agree with Serge, we just need to return -EIO
/ -EACCES
.
@hallyn thanks for your feedback, I've replaced ENOTSUP
by EACCES
.
@mihalicyn I guess "hardening, defense in depth, ..." could be an answer here.
@ three of you : I noticed in "personality change" code that nothing is done if "personality restoration" actually fails. It means that lxcfs would continue to run with host personality if somehow first call to personality
succeeds, but second fails.
I don't have an overall idea of the (security ?) consequences here, but I don't think regularly pursuing program workflow (including emitting an error log) is fine. WDYT ?
Thanks for your time ! Bye 👋
Your branch seems overly complicated when all we really want to do is catch the failure to read the personality file and then return EACCES back to the reader.
We don't really need to know that we're dealing with YAMA or some other LSM similarly messing with us, we just want an I/O error to hit the reader and a message in the LXCFS README.md file to mention that YAMA when set to 3
will break the file.
I agree, but as I originally stated, if we don't perform the check beforehand, lxcfs keeps triggering Yama detection of ptrace usages (cf. kernel log line).
I'm fine with that, at the end of the day, we need the system's administrator to notice that something isn't working properly and then decide to either rollback the YAMA setting or uninstall LXCFS.
Would you accept another (simple) LBYL approach if we check (once) for /proc/self/personality
access ? I noticed it fails when ptrace is fully-disabled, and the check wouldn't be "Yama-specific".
EDIT : plus "rollback the Yama setting [...]" isn't anodyne as it requires a system reboot :roll_eyes:
EDIT 2 : It appears /proc/self/personality
is always accessible, whereas /proc/$$/personality
is not with a ptrace scope of 3
(for instance in a Bash command with a subprocess).
I opted for init
personality access check instead in #638 (to prevent forking off a child for this).
Yeah, that should be fine.
Hello there :wave:
I've noticed literally thousands of errors related to personality retrieval, very likely due to Yama.
It looks like (lxcfs logs) :
As Yama also reports unauthorized (prevented) accesses, it actually floods kernel logs as well :
When we dig this a bit, we can read in
proc(5)
:I wanted to propose a patch but I'm not sure about the best approach here. Should we :
EPERM
specifically so as to ignoring it ? (this wouldn't address Yama complaining about "ptrace attachment tentative")/proc/sys/kernel/yama/ptrace_scope
before even trying to get task personality (LBYL) ?Thanks for your time, bye :pray:
Setup : lxcfs 6.0.0 / Linux 6.8.4