google / syzkaller

syzkaller is an unsupervised coverage-guided kernel fuzzer
Apache License 2.0
5.35k stars 1.22k forks source link

executor: create network namespace per test #3369

Open dvyukov opened 2 years ago

dvyukov commented 2 years ago

Currently network namespaces are shared across multiple tests. This was done for performance reasons. Creation/destruction of net ns is slow + it's globally serialized thus parallel test processes suffer + destruction is asynchronous and we can create them faster than they are destroyed and deplete system memory.

However, shared namespaces lead to some issues:

  1. Tests are not isolated (non-reproducible coverage/crashes).
  2. We carry some additional code to restore some isolation (checkpoint/reset_iptables/arptables/ebtables). This is 500 LOC + it does not seem to work always.
  3. It leads to some false positive reproducers (a repeated tests creates an infinite number of devices/qdisks with timers, which eventually leads to system stalls).

We should re-consider use of net namespace per test. There were some performance improvements in this area (batched destruction) + we can try to control number of pending to-be-destroyed namespaces with /proc/sys/user/max_net_namespaces. It's decremented at the very end of destruction thus account for pending namespaces.

However, it still seems to be significantly slow. Here is a quick benchmarking test: https://github.com/dvyukov/syzkaller/commit/5c6ebedf93986ba0f99b8491cfa6665fbb1b6fe9 https://gist.github.com/dvyukov/02cacfe2b5aa98e1f42ed67f31dff5b7 (currently it also shows unbounded memory growth due to pending namespaces) On our syzbot kernel in qemu with 2 CPUs it shows not very promising numbers (also shows that max_net_namespaces works):

# echo -n 20 > /proc/sys/user/max_net_namespaces
# ./syz-executor netns-bench 4
ops=0
ops=0
ops=1
ops=3
ops=0
ops=2
ops=2
ops=0
ops=1
ops=3
ops=0
ops=1
ops=3
ops=0
ops=0
ops=0
ops=2
ops=2
SYZFAIL: CLONE_NEWNET failed
 (errno 28: No space left on device)
SYZFAIL: subprocess failed
status=67 (errno 0: Success)
dvyukov commented 2 years ago

From Eric Dumazet:

Part of the cost comes from inet_twsk_purge(), scanning half million slot TCP ehash table.
I attempted to get rid of it this back in January, but had to revert this in May, (4bbdb2b00bb2f23a2)

Good news is that soon, each netns will be able to use its own (smaller) hash table, of arbitrary size

[https://patchwork.kernel.org/project/netdevbpf/list/?series=675049](https://www.google.com/url?q=https://patchwork.kernel.org/project/netdevbpf/list/?series%3D675049&sa=D&source=buganizer&usg=AOvVaw0Xps7VAn87WlZaJ_JaAkKV)
Relevant (and last of the series) patch
[https://patchwork.kernel.org/project/netdevbpf/patch/20220908011022.45342-7-kuniyu@amazon.com/](https://www.google.com/url?q=https://patchwork.kernel.org/project/netdevbpf/patch/20220908011022.45342-7-kuniyu@amazon.com/&sa=D&source=buganizer&usg=AOvVaw1SDUzKQduS_aTO6rAF3qSe)

So syzbot will be able to use a much smaller per net-ns hash, making netns dismantle faster ?
Although I am not so sure, because currently inet_twsk_purge() is called for a batch of dismantling netns.

# sysctl -w net.ipv4.tcp_child_ehash_entries=128
  net.ipv4.tcp_child_ehash_entries = 128

unshare ...