sahlberg / libsmb2

SMB2/3 userspace client
Other
310 stars 126 forks source link

Many issues running libsmb2 in the context of a fuse filesystem #325

Closed memecode closed 3 months ago

memecode commented 3 months ago

This is my attempt to document all the issues I'm seeing when trying to use libsmb2 as a fuse file system. There are quite a few things like double free's of memory blocks. Hangs in the code. Accessing PDU's that have been freed, or other memory that has been freed already. And then if none of that happens, the client loses sync with the data from the server and the parsing gets out of wack forcing either the client or the server to drop the connection.

The execution context is 1) running on a ubuntu 22.04 linux machine, using fuse3 and some wrapper code. Debugging via valgrind or gdb. Connecting to the localhost samba share. Or 2) a windows machine using winfsp's fuse3 compatibility layer, more or less than same glue code and compiled with visual studio 2019. Connecting to a remote linux machine also running 22.04's default samba impl.

Firstly in the windows case I see a failure when I browse to a subfolder in the mount point at this location:

    smb2.dll!smb2_read_from_socket(smb2_context * smb2) Line 674    C
    smb2.dll!smb2_service_fd(smb2_context * smb2, unsigned __int64 fd, int revents) Line 834    C
    smb2.dll!smb2_service(smb2_context * smb2, int revents) Line 862    C
    smb2.dll!wait_for_reply(smb2_context * smb2, sync_cb_data * cb_data) Line 103   C
    smb2.dll!smb2_stat(smb2_context * smb2, const char * path, smb2_stat_64 * st) Line 657  C
    fuse_libsmb2.exe!wrapper_getattr(const char * path, fuse_stat * stbuf, fuse_file_info * fi) Line 191    C++

Where 'niov' is zero and obviously the read(...) callback fails with 10022 (WSAEINVAL) which is fine. The error is set with smb2_set_error(smb2, "Read from socket failed, errno:%d. Closing socket.", err);

The intent seems to be to close the socket, however no attempt to actually close the socket happens in any of the calling functions, so further attempts to read from the socket continue... ad infinitum. Causing the whole process to hang eating 100% of a CPU core.

Also in smb2_read_data I see that it frees the local variable 'pdu' by calling 'smb2_free_pdu' at the bottom. However if 'is_chained==true' will goto the read_more_data label and start from the top again. If it falls into the SMB2_RECV_FIXED case, it will deref the freed pointer (or NULL if you set it to NULL). Which causes a crash. I've seen that a few times, but don't know how to to reproduce it.

Another issue I see is that in smb2_free_iovector it tries to free some smb2_iovec and in some cases the memory has already been freed. Also no attempt to set the pointer to NULL after the free is made. So it's pointing to nothing.

I've seen smb2_decode_header fail a few times, where the code falls into the 'memcmp(iov->buf, smb2sign, 4)' error handler. This seems to happen when the parsing of the data stream gets messed up, the client and server not agreeing on the protocol.

Overall it seems like the library does a few basic things ok, maybe a ls or dir command, but whenever you open a system file explorer (be it windows explorer or gnomes file manager) and browse to the mount everything falls apart.

memecode commented 3 months ago

The wrapper code is available here: https://github.com/memecode/fuse_libsmb2

I've added this to lib/CMakeLists.txt: target_include_directories(smb2 PUBLIC ../include)

So that clients of the library can find the headers. It's the done thing with cmake targets.

memecode commented 3 months ago

I've captured the data for the hang using wireshark.

memecode commented 3 months ago

Ok I'm investigating the hang issue and found that in wait_for_reply(...) smb2_service is returning a t_socket. Which on windows is unsigned... so "-1" is actually 18446744073709551615. Ha. And the error handler doesn't work obviously. Normally on windows you'd use 'INVALID_SOCKET' instead of -1 and just compare against that.

In fixing that the double free issue starts cropping up a lot more. And the socket does now get closed. What leads up to niov == 0 in smb2_read_data before the call to 'func' is unknown at this stage. Well aside from the 'while (num_done >= tmpiov->iov_len)' loop decrementing it to zero. Why are all the vectors read already? IDK

Actually in reading the code in smb2_service_fd, I think it shouldn't return a t_socket at all, just a standard int. And obviously the same for smb2_service,

memecode commented 3 months ago

For the double free I captured some stacks:

    stack of alloc:
        00007FFE822E6C56: smb2.dll, smb\libmem\libmem.cpp:61
        00007FFE822E1EDC: smb2.dll, smb\sahlberg-libsmb2\lib\socket.c:455 smb2_read_data -> smb2_add_iovector(...malloc...)
        00007FFE822E2809: smb2.dll, smb\sahlberg-libsmb2\lib\socket.c:679 smb2_read_from_socket
        00007FFE822E0736: smb2.dll, smb\sahlberg-libsmb2\lib\socket.c:839 smb2_service_fd
        00007FFE822E023A: smb2.dll, smb\sahlberg-libsmb2\lib\socket.c:867 smb2_service
        00007FFE822E50E5: smb2.dll, smb\sahlberg-libsmb2\lib\sync.c:104
        00007FFE822E3F2D: smb2.dll, smb\sahlberg-libsmb2\lib\sync.c:232
        00007FF7B8ED39F8: fuse_libsmb2.exe, smb\fuse_libsmb2\main.cpp:234

    stack of first free:
        00007FFE822E6F88: smb2.dll, C:\Users\Matthew\work\smb\libmem\libmem.cpp:122
        00007FFE822C2D96: smb2.dll, C:\Users\Matthew\work\smb\sahlberg-libsmb2\lib\init.c:377 smb2_free_iovector
        00007FFE822E27BF: smb2.dll, C:\Users\Matthew\work\smb\sahlberg-libsmb2\lib\socket.c:674 smb2_read_from_socket
        00007FFE822E0736: smb2.dll, C:\Users\Matthew\work\smb\sahlberg-libsmb2\lib\socket.c:839 smb2_service_fd
        00007FFE822E023A: smb2.dll, C:\Users\Matthew\work\smb\sahlberg-libsmb2\lib\socket.c:867
        00007FFE822E50E5: smb2.dll, C:\Users\Matthew\work\smb\sahlberg-libsmb2\lib\sync.c:104
        00007FFE822E3F2D: smb2.dll, C:\Users\Matthew\work\smb\sahlberg-libsmb2\lib\sync.c:232
        00007FF7B8ED39F8: fuse_libsmb2.exe, C:\Users\Matthew\work\smb\fuse_libsmb2\main.cpp:234

    stack of 2nd free:
        smb2.dll!smb2_free_iovector(smb2_context * smb2, smb2_io_vectors * v) Line 377  C
        smb2.dll!smb2_read_from_socket(smb2_context * smb2) Line 674    C
        smb2.dll!smb2_service_fd(smb2_context * smb2, unsigned __int64 fd, int revents) Line 839    C
        smb2.dll!smb2_service(smb2_context * smb2, int revents) Line 867    C
        smb2.dll!wait_for_reply(smb2_context * smb2, sync_cb_data * cb_data) Line 104   C
        smb2.dll!smb2_stat(smb2_context * smb2, const char * path, smb2_stat_64 * st) Line 659  C
        fuse_libsmb2.exe!wrapper_getattr(const char * path, fuse_stat * stbuf, fuse_file_info * fi) Line 191    C++
memecode commented 3 months ago

Potentially the fix for the socket validity checking issue on windows would be something like this.

memecode commented 3 months ago

Today I'm seeing smb2_decode_header (case SMB2_RECV_HEADER) fail because the last buffer in smb2->in.iov is 4 bytes long. That buffer was added by smb2_read_from_socket (SMB2_SPL_SIZE = 4).

Ie the error iov->len < SMB2_HEADER_SIZE is triggered.

And even if I just add some more buffer using a new call to smb2_add_iovector the memcmp(iov->buf, smb2sign, 4) check fails as the data is 0x1, 0x0, 0x0, 0x0. So clearly the protocol parsing needs work. In fact the whole library is just not production ready. I've exhausted my time box for looking at these bugs.

The wireshark capture for this error is here.

sahlberg commented 3 months ago

That is an odd error. Please make sure that the application is single-threaded.

The library is meant for event-driven applications and is not threads-safe.

On Mon, 25 Mar 2024 at 16:06, memecode @.***> wrote:

Today I'm seeing smb2_decode_header (case SMB2_RECV_HEADER) fail because the last buffer in smb2->in.iov is 4 bytes long. That buffer was added by smb2_read_from_socket (SMB2_SPL_SIZE = 4).

Ie the error iov->len < SMB2_HEADER_SIZE is triggered.

— Reply to this email directly, view it on GitHub https://github.com/sahlberg/libsmb2/issues/325#issuecomment-2017291271, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADY3ECK6IOK7FR77GKAHB3YZ65EZAVCNFSM6AAAAABFEA3SEKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJXGI4TCMRXGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

memecode commented 3 months ago

Please make sure that the application is single-threaded.

Ah... that makes so much more sense now; thanks. I'm new to fuse and I assumed that all the fuse call back functions would be executed in the same thread. Clearly they are not.

In any case, I wrapped all the callbacks in std::scoped_lock<std::mutex> and things have settled down a lot. I get a lot further into the share via Explorer. I'll clean up my debugging stuff and see how performant it is.

memecode commented 3 months ago

So here's a curly question for you @sahlberg - if I lock the global mutex over the call to smb2_stat_async and smb2_service I should be able to have multiple outstanding stat's going at the same time right? I haven't had the time yet to try and my code is currently using all sync versions of the functions. And performance is "ok" but could it be better? Maybe...

Ok so I think I'll close this bug, it's my fault for getting the threading model wrong. But there are still some worthwhile changes for building under windows here. Is it worth create a merge request for those? Do you think they're useful to other users?