lxc / lxcfs

FUSE filesystem for LXC
https://linuxcontainers.org/lxcfs
Other
1.04k stars 251 forks source link

lxcfs: tighten policy about write() syscall #630

Closed mihalicyn closed 6 months ago

mihalicyn commented 6 months ago

@stgraber which scenarios are you thinking that enabling --allow-write-on-cgroup may be useful for btw?

before cgroup namespace appeared, LXCFS was used to emulate cgroup tree inside the container. So it's usually safe to allow cgroup writes to a cgroup subtree from inside the container (it is something that we allow these days by default when cgroup namespace is used. "cgroup delegation" thing).

We introducing this just as an extra security measure. These days this cgroup emulation code in LXCFS is almost dead.

stgraber commented 6 months ago

I actually expected a --enable-cgroup as I don't think we should show a cgroup tree at all by default, not even read only.

I just don't want to throw away the code because of embedded platforms running very old kernels and because we may reuse the code to offer a cgroup1 tree on a cgroup2 host system down the line.

tomponline commented 6 months ago

@stgraber which scenarios are you thinking that enabling --allow-write-on-cgroup may be useful for btw?

before cgroup namespace appeared, LXCFS was used to emulate cgroup tree inside the container. So it's usually safe to allow cgroup writes to a cgroup subtree from inside the container (it is something that we allow these days by default when cgroup namespace is used. "cgroup delegation" thing).

We introducing this just as an extra security measure. These days this cgroup emulation code in LXCFS is almost dead.

Thanks. So if LXCFS on newer systems isn;t emulating cgroup tree, and writes are usually safe and allowed from inside the container, then can you help me understand the issue that requires disabling writes by default (and fixing the disabling of cpu).

mihalicyn commented 6 months ago

I actually expected a --enable-cgroup as I don't think we should show a cgroup tree at all by default, not even read only.

It's just empty, if you don't have cgroup-v1 on the host.

mihalicyn commented 6 months ago

Let me rework it a bit, then :)

tomponline commented 6 months ago

I just don't want to throw away the code because of embedded platforms running very old kernels and because we may reuse the code to offer a cgroup1 tree on a cgroup2 host system down the line.

Got you, thanks!

stgraber commented 6 months ago

Yeah, but even on cgroup1 hosts, I'd prefer we default to no virtual cgroup tree unless --enable-cgroup is passed.

mihalicyn commented 6 months ago

Hm, it looks like something is wrong with ubuntu-22.04 github runners. Let's wait a bit and once it become green we can merge. I can't see or reproduce any issues with this PR on my machine.

mihalicyn commented 6 months ago

https://github.com/actions/runner/issues/3207 https://github.com/google/sanitizers/issues/1614

mihalicyn commented 6 months ago

... and this:

[    4.442222] ================================================================================
[    4.444656] UBSAN: array-index-out-of-bounds in /build/linux-azure-6.5-dqcUxs/linux-azure-6.5-6.5.0/drivers/net/hyperv/netvsc.c:1448:41
[    4.448028] index 1 is out of range for type 'vmtransfer_page_range [1]'
[    4.449852] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.5.0-1016-azure #16~22.04.1-Ubuntu
[    4.449856] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
[    4.449858] Call Trace:
[    4.449860]  <IRQ>
[    4.449864]  dump_stack_lvl+0x37/0x50
[    4.449874]  dump_stack+0x10/0x20
[    4.449876]  __ubsan_handle_out_of_bounds+0xa2/0xe0
[    4.449881]  ? down_write_failed+0x191/0x1b0
[    4.449886]  ? rndis_filter_receive+0x85/0x190 [hv_netvsc]
[    4.449895]  ? __update_load_avg_se+0x248/0x2c0
[    4.449900]  netvsc_receive+0x418/0x440 [hv_netvsc]
[    4.449906]  ? srso_alias_return_thunk+0x5/0x7f
[    4.449911]  netvsc_poll+0x15c/0x470 [hv_netvsc]
[    4.449918]  __napi_poll+0x33/0x1d0
[    4.449923]  net_rx_action+0x17b/0x2e0
[    4.449926]  ? srso_alias_return_thunk+0x5/0x7f
[    4.449930]  __do_softirq+0xd8/0x2de
[    4.449933]  __irq_exit_rcu+0xb0/0xd0
[    4.449938]  irq_exit_rcu+0xe/0x20
[    4.449942]  sysvec_hyperv_callback+0x80/0x90
[    4.449945]  </IRQ>
[    4.449947]  <TASK>
[    4.449949]  asm_sysvec_hyperv_callback+0x1b/0x20
[    4.449952] RIP: 0010:default_idle+0xb/0x20
[    4.449956] Code: ff eb c9 cc cc cc cc cc cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d 77 9d 23 00 fb f4 <fa> e9 ff 47 01 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90
[    4.449959] RSP: 0018:ffffbc59400a3e88 EFLAGS: 00000216
[    4.449962] RAX: ffff92ae6fcadba8 RBX: ffff92aac038cc80 RCX: ffff92aad4449000
[    4.449964] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000033ddc
[    4.449966] RBP: ffffbc59400a3e90 R08: 0000000108b747a2 R09: 0000000000000001
[    4.449968] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    4.449970] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    4.449974]  ? srso_alias_return_thunk+0x5/0x7f
[    4.449976]  ? arch_cpu_idle+0x9/0x10
[    4.449979]  default_idle_call+0x2c/0xa0
[    4.449983]  cpuidle_idle_call+0x132/0x170
[    4.449986]  do_idle+0x82/0xf0
[    4.449989]  cpu_startup_entry+0x2d/0x30
[    4.449991]  start_secondary+0x120/0x150
[    4.449996]  secondary_startup_64_no_verify+0x17e/0x18b
[    4.450002]  </TASK>
[    4.450004] ================================================================================
[    4.452418] ================================================================================
[    4.454661] UBSAN: array-index-out-of-bounds in /build/linux-azure-6.5-dqcUxs/linux-azure-6.5-6.5.0/drivers/net/hyperv/netvsc.c:1449:41
[    4.457833] index 1 is out of range for type 'vmtransfer_page_range [1]'
[    4.459604] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.5.0-1016-azure #16~22.04.1-Ubuntu
[    4.459607] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
[    4.459609] Call Trace:
[    4.459610]  <IRQ>
[    4.459612]  dump_stack_lvl+0x37/0x50
[    4.459617]  dump_stack+0x10/0x20
[    4.459620]  __ubsan_handle_out_of_bounds+0xa2/0xe0
[    4.459624]  ? down_write_failed+0x191/0x1b0
[    4.459628]  ? rndis_filter_receive+0x85/0x190 [hv_netvsc]
[    4.459635]  ? __update_load_avg_se+0x248/0x2c0
[    4.459639]  netvsc_receive+0x401/0x440 [hv_netvsc]
[    4.459645]  ? srso_alias_return_thunk+0x5/0x7f
[    4.459649]  netvsc_poll+0x15c/0x470 [hv_netvsc]
[    4.459656]  __napi_poll+0x33/0x1d0
[    4.459660]  net_rx_action+0x17b/0x2e0
[    4.459663]  ? srso_alias_return_thunk+0x5/0x7f
[    4.459667]  __do_softirq+0xd8/0x2de
[    4.459671]  __irq_exit_rcu+0xb0/0xd0
[    4.459675]  irq_exit_rcu+0xe/0x20
[    4.459678]  sysvec_hyperv_callback+0x80/0x90
[    4.459682]  </IRQ>
[    4.459683]  <TASK>
[    4.459685]  asm_sysvec_hyperv_callback+0x1b/0x20
[    4.459688] RIP: 0010:default_idle+0xb/0x20
[    4.459691] Code: ff eb c9 cc cc cc cc cc cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d 77 9d 23 00 fb f4 <fa> e9 ff 47 01 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90
[    4.459693] RSP: 0018:ffffbc59400a3e88 EFLAGS: 00000216
[    4.459696] RAX: ffff92ae6fcadba8 RBX: ffff92aac038cc80 RCX: ffff92aad4449000
[    4.459698] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000033ddc
[    4.459700] RBP: ffffbc59400a3e90 R08: 0000000108b747a2 R09: 0000000000000001
[    4.459702] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    4.459703] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    4.459707]  ? srso_alias_return_thunk+0x5/0x7f
[    4.459710]  ? arch_cpu_idle+0x9/0x10
[    4.459713]  default_idle_call+0x2c/0xa0
[    4.459716]  cpuidle_idle_call+0x132/0x170
[    4.459719]  do_idle+0x82/0xf0
[    4.459722]  cpu_startup_entry+0x2d/0x30
[    4.459724]  start_secondary+0x120/0x150
[    4.459728]  secondary_startup_64_no_verify+0x17e/0x18b
[    4.459734]  </TASK>
[    4.459735] ================================================================================