namjaejeon / ksmbd

ksmbd kernel server(SMB/CIFS server)
https://github.com/cifsd-team/ksmbd
270 stars 62 forks source link

Regression when under load #424

Closed mmakassikis closed 1 year ago

mmakassikis commented 1 year ago

Hello,

I am seeing a regression on some setups: when transferring large files from 2 clients simultaneously, sometimes the client will error out. With kernel cifs client, the user sees a EIO/EPIPE errors. Sometimes, the process doing the I/O will recover after being stuck for a while. In these cases, there is usually some error in dmesg:

CIFS: VFS: \192.168.1.254 Error -32 sending data on socket to server

Steps I used to reproduce the error:

Create a dummy 10GB file

dd if=/dev/urandom of=file.bin bs=1M count=10000

Continuously write it to the server :

while sleep 1; do cp -v file.bin ~/share/ ; done

(repeat on a second client)

The behaviour may not always occur. To trigger it, simulate some load on the server using stress:

stress -m 2

Reverting the https://github.com/namjaejeon/ksmbd/commit/90bd11ecc1b0356997856e406becf71dd696928d commit fixes the issue.

I reapplied the diff and removed the __GFP_NOWARN and __GFP_NORETRY flags, and I can't reproduce the errors.

Why are these flags needed ?

namjaejeon commented 1 year ago

ksmbd server is the kernel server. So ksmbd can trigger OOM. but kernel thread can not be killed like user process by OOM killer. So other process unrelated to the ksmbd server will be killed. As you check patch description, __GFP_NORETRY will cause the ksmbd server to keep trying to allocate memory and it won't recover from the situation causing the OOM issue to continue.

mmakassikis commented 1 year ago

OK, so the approach is to disconnect the request (and in doing so free some memory) rather than let the allocator retry as this may kill some userland process. Oddly enough, I'm not seeing any OOM on the platform I'm testing on. I guess I'll have to maintain a local patch for this.

Looking at the code in ksmbd_conn_handler_loop(), I'm not sure the max_allowed_pdu_size computation is entirely correct. Shouldn't it be max(max_read_size, max_write_size, max_trans_size) in case the user has set different values ?

namjaejeon commented 1 year ago

I'm not seeing any OOM on the platform I'm testing on. I guess I'll have to maintain a local patch for this.

Can you check this is related with CONFIG_SWAP or other swap config? If it is related, we can use this flags if this config is enable.

Looking at the code in ksmbd_conn_handler_loop(), I'm not sure the max_allowed_pdu_size computation is entirely correct. Shouldn't it be max(max_read_size, max_write_size, max_trans_size) in case the user has set different values ?

Attackers can change too big pdu size and cause memory allocation. but we can make max_write_size as max_allowed_pdu_size . pdu size can not be bigger than it. You can test each parameters and let me know if there is any problem.

mmakassikis commented 1 year ago

Can you check this is related with CONFIG_SWAP or other swap config? If it is related, we can use this flags if this config is enable.

I don't have any swap configured, and I have set vm.panic_on_oom=1 sysctl.

You can test each parameters and let me know if there is any problem.

The scenario I had in mind is if the user, for whatever reason, has set different values for "max read" and "max write". For instance:

[global]
    ...
    smb2 max read = 128K
    smb2 max write = 64K
    smb2 max trans = 64K

In this case, the handler loop will reject valid SMB2 Read packets.

namjaejeon commented 1 year ago

In this case, the handler loop will reject valid SMB2 Read packets.

? Have you ever reproduced it directly ?

namjaejeon commented 1 year ago

I don't have any swap configured, and I have set vm.panic_on_oom=1 sysctl.

You don't use ZRAM or ZSWAP on your target ?

mmakassikis commented 1 year ago

You don't use ZRAM or ZSWAP on your target ?

No

? Have you ever reproduced it directly ?

I haven't tried it, I wondered whether it would happen when reading the code :-)

With GFP_KERNEL flag, kmalloc() is allowed to sleep. This does not happen with the patch applied because __GFP_NORETRY is set. This would explain the interrupted transfers: in low memory conditions we immediately break out of the loop and close the connection, whereas previously the kmalloc would sleep.

namjaejeon commented 1 year ago

I haven't tried it, I wondered whether it would happen when reading the code :-)

Don't worry. No issue.

in low memory conditions we immediately break out of the loop and close the connection,

I guess that client should reconnect and try it again. Can you check it with windows client ?

mmakassikis commented 1 year ago

The windows client fails the transfer as well with error 0x8007003B.

Even if it would silently reconnect and resume, it still poses a problem: ksmbd dropped the connection in the kernel, but the userland counterpart is completely unaware as it didn't see a Tree Disconnect message. Over time, no new connections can be made because the session counter reached KSMBD_CONF_DEFAULT_SESS_CAP.

namjaejeon commented 1 year ago

If you remove __GFP_NOWARN and __GFP_NORETRY flags, problem is fixed with windows client ?

mmakassikis commented 1 year ago

Yes.

namjaejeon commented 1 year ago

Okay, Can you send the patch for this to the list ?

namjaejeon commented 1 year ago

However, the latter calls kvmalloc_node() which does not support __GFP_NORETRY.

In patch, I can not understand this. You told me there is the difference before/after using __GFP_NORETRY. If kvmalloc_node ignore this flags, Why is there a difference in behavior?

namjaejeon commented 1 year ago

With GFP_KERNEL flag, kmalloc() is allowed to sleep. This does not happen with the patch applied because __GFP_NORETRY is set.

What does kernel code make this behavior ? Can you show me the related kernel code ?

namjaejeon commented 1 year ago

@mmakassikis Ping? Could you please update patch description in more detail?

mmakassikis commented 1 year ago

In patch, I can not understand this. You told me there is the difference before/after using __GFP_NORETRY

This is what I observed in testing.

What does kernel code make this behavior ? Can you show me the related kernel code ?

I'm not sure exactly which codepath changes kvmalloc() behaviour. I saw the mm-api docs (quoted below) and ran the test scenario described before and noticed that reverting the changes to kvmalloc() in connection.c restored the previous behaviour (no error on the client side).

Quoting https://www.kernel.org/doc/html/v6.2/core-api/mm-api.html#c.kmalloc

GFP_KERNEL Allocate normal kernel ram. May sleep.

From the kvmalloc_node doc in the same page:

Description

Uses kmalloc to get the memory but if the allocation fails then falls back to the vmalloc allocator. Use kvfree for freeing the memory.

GFP_NOWAIT and GFP_ATOMIC are not supported, neither is the __GFP_NORETRY modifier. __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc is preferable to the vmalloc fallback, due to visible performance drawbacks.

namjaejeon commented 1 year ago

When running ksmbd server that receiving the patckets, there are background applications on your target ? I mean background applications can't allocation memory allocation and OOM happen.

mmakassikis commented 1 year ago

Yes, there are other services running besides ksmbd.

The target is configured to panic on OOM which is not what happens when transfers fail.