nfs-ganesha / nfs-ganesha

NFS-Ganesha is an NFSv3,v4,v4.1 fileserver that runs in user mode on most UNIX/Linux systems
1.46k stars 508 forks source link

V5.6: heap-use-after-free: freed during RELEASE_LOCKOWNER, used in subsequent PUTFH (on MacOS), in NFSv4 #1119

Open drieber opened 4 months ago

drieber commented 4 months ago

I am getting an easily reproducible heap-use-after-free with my custom FSAL on MacOS.

Client and server are both on MacOS, the client is the MacOS kernel I am running with ganesha V5.6 My server only supports NFSv4 My fsal is configured with lock_support=false and lock_support_async_block=false My fsal does not implement lock_op2 Please notice MacOS kernel sends compound request: PUTFH, RELEASE_LOCKOWNER (I'm not sure if it is really necessary to send PUTFH, but that's what darwin does)

It strongly feels like refcounts gone wrong

==82347==ERROR: AddressSanitizer: heap-use-after-free on address 0x000120e392b0 at pc 0x000102ebd094 bp 0x0003020446c0 sp 0x0003020446b8
READ of size 8 at 0x000120e392b0 thread T51
    #0 0x102ebd090 in urcu_ref_get_unless_zero+0x4c (srcfsd_darwin:arm64+0x102c35090)
    #1 0x102eb5018 in gsh_refstr_get+0x14 (srcfsd_darwin:arm64+0x102c2d018)
    #2 0x102eb7a5c in tmp_get_exp_paths+0x21c (srcfsd_darwin:arm64+0x102c2fa5c)
    #3 0x102eb3540 in _get_gsh_export_ref+0x238 (srcfsd_darwin:arm64+0x102c2b540)
    #4 0x102eb41c8 in get_gsh_export+0xa1c (srcfsd_darwin:arm64+0x102c2c1c8)
    #5 0x102cefb58 in nfs4_mds_putfh+0x994 (srcfsd_darwin:arm64+0x102a67b58)
    #6 0x102cedb48 in nfs4_op_putfh+0x460 (srcfsd_darwin:arm64+0x102a65b48)
    #7 0x102c522ec in process_one_op+0x2460 (srcfsd_darwin:arm64+0x1029ca2ec)
    #8 0x102c57368 in nfs4_Compound+0x2928 (srcfsd_darwin:arm64+0x1029cf368)
    #9 0x102befbe4 in nfs_rpc_process_request+0x7918 (srcfsd_darwin:arm64+0x102967be4)
    #10 0x102bf0214 in nfs_rpc_valid_NFS+0x3cc (srcfsd_darwin:arm64+0x102968214)
    #11 0x103143f10 in svc_vc_decode+0x408 (srcfsd_darwin:arm64+0x102ebbf10)
    #12 0x10313819c in svc_request+0x18c (srcfsd_darwin:arm64+0x102eb019c)
    #13 0x103143998 in svc_vc_recv+0x4a7c (srcfsd_darwin:arm64+0x102ebb998)
    #14 0x103137fe4 in svc_rqst_xprt_task_recv+0x1f4 (srcfsd_darwin:arm64+0x102eaffe4)
    #15 0x10312f784 in svc_rqst_epoll_loop+0xbd4 (srcfsd_darwin:arm64+0x102ea7784)
    #16 0x10314e200 in work_pool_thread+0x8b8 (srcfsd_darwin:arm64+0x102ec6200)
    #17 0x185bc6f90 in _pthread_start+0x84 (libsystem_pthread.dylib:arm64e+0x6f90)
    #18 0x185bc1d30 in thread_start+0x4 (libsystem_pthread.dylib:arm64e+0x1d30)

0x000120e392b0 is located 0 bytes inside of 10-byte region [0x000120e392b0,0x000120e392ba)
freed by thread T53 here:
    #0 0x11ea3ece0 in wrap_free+0x98 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x52ce0)
    #1 0x102f20eb8 in gsh_free+0x14 (srcfsd_darwin:arm64+0x102c98eb8)
    #2 0x102f20e94 in gsh_refstr_release+0x13c (srcfsd_darwin:arm64+0x102c98e94)
    #3 0x102f79be8 in urcu_ref_put+0x3e4 (srcfsd_darwin:arm64+0x102cf1be8)
    #4 0x102f74de0 in gsh_refstr_put+0x1c (srcfsd_darwin:arm64+0x102cecde0)
    #5 0x102f7676c in clear_op_context_export_impl+0x378 (srcfsd_darwin:arm64+0x102cee76c)
    #6 0x102f762b4 in clear_op_context_export+0x20 (srcfsd_darwin:arm64+0x102cee2b4)
    #7 0x102c57514 in nfs4_Compound+0x2ad4 (srcfsd_darwin:arm64+0x1029cf514)
    #8 0x102befbe4 in nfs_rpc_process_request+0x7918 (srcfsd_darwin:arm64+0x102967be4)
    #9 0x102bf0214 in nfs_rpc_valid_NFS+0x3cc (srcfsd_darwin:arm64+0x102968214)
    #10 0x103143f10 in svc_vc_decode+0x408 (srcfsd_darwin:arm64+0x102ebbf10)
    #11 0x10313819c in svc_request+0x18c (srcfsd_darwin:arm64+0x102eb019c)
    #12 0x103143998 in svc_vc_recv+0x4a7c (srcfsd_darwin:arm64+0x102ebb998)
    #13 0x103137fe4 in svc_rqst_xprt_task_recv+0x1f4 (srcfsd_darwin:arm64+0x102eaffe4)
    #14 0x10312f784 in svc_rqst_epoll_loop+0xbd4 (srcfsd_darwin:arm64+0x102ea7784)
    #15 0x10314e200 in work_pool_thread+0x8b8 (srcfsd_darwin:arm64+0x102ec6200)
    #16 0x185bc6f90 in _pthread_start+0x84 (libsystem_pthread.dylib:arm64e+0x6f90)
    #17 0x185bc1d30 in thread_start+0x4 (libsystem_pthread.dylib:arm64e+0x1d30)

previously allocated by thread T0 here:
    #0 0x11ea3eba4 in wrap_malloc+0x94 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x52ba4)
    #1 0x102f20ce4 in gsh_refstr_alloc+0x18 (srcfsd_darwin:arm64+0x102c98ce4)
    #2 0x102ed99a4 in gsh_refstr_dup+0x24 (srcfsd_darwin:arm64+0x102c519a4)
    #3 0x102ede61c in export_commit_common+0x4844 (srcfsd_darwin:arm64+0x102c5661c)
    #4 0x102ee2630 in export_commit+0x30 (srcfsd_darwin:arm64+0x102c5a630)
    #5 0x10307c534 in proc_block+0x10b4 (srcfsd_darwin:arm64+0x102df4534)
    #6 0x10307d24c in load_config_from_parse+0x79c (srcfsd_darwin:arm64+0x102df524c)
    #7 0x102ec7a2c in ReadExports+0xaec (srcfsd_darwin:arm64+0x102c3fa2c)
    #8 0x102bbc934 in nfs_libmain2+0x164c (srcfsd_darwin:arm64+0x102934934)
    #9 0x10262c00c in devtools_srcfs::SrcfsnMain(devtools_vfs::NfsMainOptions&)+0x388 (srcfsd_darwin:arm64+0x1023a400c)
    #10 0x100a3302c in devtools_srcfs::(anonymous namespace)::Main()+0x1050 (srcfsd_darwin:arm64+0x1007ab02c)
    #11 0x100a2fcc4 in main+0xd64 (srcfsd_darwin:arm64+0x1007a7cc4)
    #12 0x18583e0dc  (<unknown module>)

Thread T51 created by T40 here:
    #0 0x11ea37b10 in wrap_pthread_create+0x54 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x4bb10)
    #1 0x10314c91c in work_pool_spawn+0xfc (srcfsd_darwin:arm64+0x102ec491c)
    #2 0x10314e00c in work_pool_thread+0x6c4 (srcfsd_darwin:arm64+0x102ec600c)
    #3 0x185bc6f90 in _pthread_start+0x84 (libsystem_pthread.dylib:arm64e+0x6f90)
    #4 0x185bc1d30 in thread_start+0x4 (libsystem_pthread.dylib:arm64e+0x1d30)

Thread T40 created by T39 here:
    #0 0x11ea37b10 in wrap_pthread_create+0x54 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x4bb10)
    #1 0x10314c91c in work_pool_spawn+0xfc (srcfsd_darwin:arm64+0x102ec491c)
    #2 0x10314e00c in work_pool_thread+0x6c4 (srcfsd_darwin:arm64+0x102ec600c)
    #3 0x185bc6f90 in _pthread_start+0x84 (libsystem_pthread.dylib:arm64e+0x6f90)
    #4 0x185bc1d30 in thread_start+0x4 (libsystem_pthread.dylib:arm64e+0x1d30)

Thread T39 created by T38 here:
    #0 0x11ea37b10 in wrap_pthread_create+0x54 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x4bb10)
    #1 0x10314c91c in work_pool_spawn+0xfc (srcfsd_darwin:arm64+0x102ec491c)
    #2 0x10314e00c in work_pool_thread+0x6c4 (srcfsd_darwin:arm64+0x102ec600c)
    #3 0x185bc6f90 in _pthread_start+0x84 (libsystem_pthread.dylib:arm64e+0x6f90)
    #4 0x185bc1d30 in thread_start+0x4 (libsystem_pthread.dylib:arm64e+0x1d30)
Thread T38 created by T0 here:
    #0 0x11ea37b10 in wrap_pthread_create+0x54 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x4bb10)
    #1 0x10314c91c in work_pool_spawn+0xfc (srcfsd_darwin:arm64+0x102ec491c)
    #2 0x10314c608 in work_pool_init+0xc2c (srcfsd_darwin:arm64+0x102ec4608)
    #3 0x103117484 in svc_init+0xc6c (srcfsd_darwin:arm64+0x102e8f484)
    #4 0x102bdf91c in nfs_Init_svc+0xc70 (srcfsd_darwin:arm64+0x10295791c)
    #5 0x102bb58f8 in nfs_Init+0xe58 (srcfsd_darwin:arm64+0x10292d8f8)
    #6 0x102bb4594 in nfs_start+0x260 (srcfsd_darwin:arm64+0x10292c594)
    #7 0x102bbcb80 in nfs_libmain2+0x1898 (srcfsd_darwin:arm64+0x102934b80)
    #8 0x10262c00c in devtools_srcfs::SrcfsnMain(devtools_vfs::NfsMainOptions&)+0x388 (srcfsd_darwin:arm64+0x1023a400c)
    #9 0x100a3302c in devtools_srcfs::(anonymous namespace)::Main()+0x1050 (srcfsd_darwin:arm64+0x1007ab02c)
    #10 0x100a2fcc4 in main+0xd64 (srcfsd_darwin:arm64+0x1007a7cc4)
    #11 0x18583e0dc  (<unknown module>)

Thread T53 created by T39 here:
    #0 0x11ea37b10 in wrap_pthread_create+0x54 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x4bb10)
    #1 0x10314c91c in work_pool_spawn+0xfc (srcfsd_darwin:arm64+0x102ec491c)
    #2 0x10314e00c in work_pool_thread+0x6c4 (srcfsd_darwin:arm64+0x102ec600c)
    #3 0x185bc6f90 in _pthread_start+0x84 (libsystem_pthread.dylib:arm64e+0x6f90)
    #4 0x185bc1d30 in thread_start+0x4 (libsystem_pthread.dylib:arm64e+0x1d30)

SUMMARY: AddressSanitizer: heap-use-after-free (srcfsd_darwin:arm64+0x102c35090) in urcu_ref_get_unless_zero+0x4c
Shadow bytes around the buggy address:
  0x000120e39000: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x000120e39080: fa fa fa fa fa fa 07 fa fa fa 07 fa fa fa 07 fa
  0x000120e39100: fa fa 00 00 fa fa fa fa fa fa fa fa fa fa 00 00
  0x000120e39180: fa fa fa fa fa fa fa fa fa fa 03 fa fa fa fa fa
  0x000120e39200: fa fa 00 02 fa fa fa fa fa fa fa fa fa fa fa fa
=>0x000120e39280: fa fa fd fd fa fa[fd]fd fa fa fa fa fa fa fa fa
  0x000120e39300: fa fa 00 03 fa fa fa fa fa fa fa fa fa fa fa fa
  0x000120e39380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa 02 fa
  0x000120e39400: fa fa 02 fa fa fa 00 02 fa fa 00 02 fa fa 02 fa
  0x000120e39480: fa fa 02 fa fa fa 00 04 fa fa 00 00 fa fa 00 00
  0x000120e39500: fa fa 00 00 fa fa 00 03 fa fa 00 04 fa fa 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==82347==ABORTING

I think what's used-after-free is either op_ctx->ctx_fullpath or op_ctx->ctx_pseudopath (or maybe fullpath / pseudopath?).

I suspect the issue is somewhere in release_lock_owner: https://github.com/nfs-ganesha/nfs-ganesha/blob/V5.6/src/SAL/nfs4_state.c#L708-L745

drieber commented 4 months ago

Here is a snippet from the server logs. log-snippet.log

ffilz commented 4 months ago

I think this will fix the issue: https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/1193187?usp=search

ffilz commented 4 months ago

Patch updated. All handling of the gsh_refstr has been clarified. Documentation (comments and function documentation) has now been provided. I also cleaned up so code is used more in common and more clearly.

drieber commented 4 months ago

I tested patchset 2 locally and it does fix my issue.

I need to deliver a fix to my users quickly. Even if you submit this fix to V6-dev today, I am still using V5.6. If you submitted the fix to V5., we would still have to carefully import it into our build. With that in mind, I am thinking maybe I submit a local fix on my side while I give our developers more time to review your more extensive fix. So I have this question for you: does this fix look acceptable to you:

--- a/src/FSAL/commonlib.c
+++ b/src/FSAL/commonlib.c
@@ -3043,8 +3043,8 @@ static void set_op_context_export_fsal_n
                          bool discard_refstr)
 {
    if (discard_refstr) {
-       gsh_refstr_put(op_ctx->ctx_fullpath);
-       gsh_refstr_put(op_ctx->ctx_pseudopath);
+       if (op_ctx->ctx_fullpath != NULL) gsh_refstr_put(op_ctx->ctx_fullpath);
+       if (op_ctx->ctx_pseudopath != NULL) gsh_refstr_put(op_ctx->ctx_pseudopath);
    }

    op_ctx->ctx_export = exp;
@@ -3165,6 +3165,8 @@ void save_op_context_export_and_clear(st
    op_ctx->ctx_export = NULL;
    op_ctx->fsal_export = NULL;
    op_ctx->ctx_pnfs_ds = NULL;
+   op_ctx->ctx_fullpath = NULL;
+   op_ctx->ctx_pseudopath = NULL;
 }

 void restore_op_context_export(struct saved_export_context *saved)
ffilz commented 4 months ago

As a downstream only fix, that should fix the immediate issue. I do encourage changing to the more complete fix ASAP and perhaps only release this temporary fix in a fork, but all of that is up to you how to handle.

ffilz commented 4 months ago

Changing to verified based on comment above.

ffilz commented 4 months ago

Also, please feel free to add a Verified +1 to the Gerrithub review, and code review +1 if you are comfortable doing so.