ioi / isolate

Sandbox for securely executing untrusted programs
Other
1.04k stars 154 forks source link

isolate is significantly slower with CG enabled #126

Open sadfun opened 1 year ago

sadfun commented 1 year ago

On fresh Ubuntu 22.04 installation, working with isolate in cgroups mode is incomparably slower than without it. This work both for master and cg2 branches.

# time isolate --init
/var/local/lib/isolate/0

real    0m0.001s
user    0m0.000s
sys 0m0.000s
# time isolate --init --cg
/var/local/lib/isolate/0

real    0m0.014s
user    0m0.001s
sys 0m0.000s

Same for run:

time isolate --run /usr/bin/echo

OK (0.000 sec real, 0.000 sec wall)

real    0m0.001s
user    0m0.001s
sys 0m0.000
time isolate --run --cg /usr/bin/echo

OK (0.000 sec real, 0.023 sec wall)

real    0m0.025s
user    0m0.002s
sys 0m0.000s

Is there any workaround to fix it, maybe changes to isolate's code or OS tweaking?

fushar commented 1 year ago

I am using Isolate with CG on Ubuntu 22.04 and do not observe such slowdown.

I disabled cgroups v2, though, using GRUB's systemd.unified_cgroup_hierarchy=false config. Can you try disabling it and see if you still have the issue?

Note -- I am just an Isolate user, not maintainer. I honestly do not know how cgroups v1 vs v2 affects the correctness of execution sandboxing, at least for my project (online judge for competitive programming).

sadfun commented 1 year ago

Yep, i tried isolate is two modes:

Both of them add +15-25 ms to init sandbox or run a command with --cg mode, both of them do it quickly (~1ms) without CG.

gollux commented 1 year ago

Can you use strace -T to find out in which system calls is the time spent?

BTW do you have an application where this makes a difference?

sadfun commented 1 year ago

Here it is: isolate-init.txt, isolate-run.txt. This is the version from master branch.

As I see, during --init, the longest syscall is mkdir("/sys/fs/cgroup/cpuset/box-0/", 0777) = 0 <0.013080> (13ms). And during --run it is wait4(53003, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={tv_sec=0, tv_usec=815}, ru_stime={tv_sec=0, tv_usec=0}, ...}) = 53003 <0.019579> (19ms).

An application where this really makes a difference is high-load online judge – when you have many small tests, on each of which the solutions usually work for 1-2 ms, increasing the judging time by 10-20 times is a bottleneck :(

sadfun commented 1 year ago

It seems that this slowdown is not caused by the isolate itself, but by the implementation of cgroups v1/v2.

One idea to improve the situation is to add a soft cleanup that doesn't remove the entire sandbox, but resets it to its original state without re-creating the control group. This will at least eliminate the --init overhead for all cases of online judges.

@fushar, you said that you do not experience such slowdowns, but it seems that such delays are present on every Linux. Maybe the 30-40ms difference is just not noticeable in your case? Could you please measure it?

gollux commented 1 year ago

I will try profiling it using a system-wide profiler, but at the moment, stabilizing and merging support for cgroup2 has higher priority.

sadfun commented 1 year ago

Surely! For the future: as profiled by @purplesyringa with strace -ff -T, it is seen that the heaviest operation in --run is actually moving process to cgroup:

[pid 119538] openat(AT_FDCWD, "/sys/fs/cgroup/memory/box-0/tasks", O_WRONLY|O_TRUNC) = 3 <0.000010>
[pid 119538] write(3, "2\n", 2)         = 2 <0.016667>
...
[pid 119538] openat(AT_FDCWD, "/sys/fs/cgroup/cpuset/box-0/tasks", O_WRONLY|O_TRUNC) = 3 <0.000007>
[pid 119538] write(3, "2\n", 2)         = 2 <0.015644>
AlexVasiluta commented 6 months ago

Hello! I wanted to chime in and suggest that maybe the clone3 syscall with CLONE_INTO_CGROUP (cg2 branch only, though, but it isn't a problem) might yield a small performance improvement in adding the process into the cgroup, since it would probably "get to know" faster the environment it's in. I have not tested this idea, but logically it would make sense.

Despite not having an official glibc wrapper function, the clone3 syscall is available from kernel 5.2 (5.7 with CLONE_INTO_CGROUP) onwards. Since isolate requires 5.19 (as stated in the manual) for properly reporting memory usage, I think we could make use of this feature.

AlexVasiluta commented 5 months ago

Update: according to the manpages, CLONE_INTO_CGROUP and using the clone3 syscall would fix this issue:

Furthermore, spawning the child process directly into a target cgroup is significantly cheaper than moving the child process into the target cgroup after it has been created.

https://www.man7.org/linux/man-pages/man2/clone.2.html

gollux commented 4 months ago

Thanks for the idea, but I'm going to postpone it for a while, because I was sitting on the cgroup v2 version for too long and I would like to release it soon. Also, CLONE_INTO_CGROUP is not supported by glibc yet and calling syscalls directly could be non-portable ... need to check.