Closed drieber closed 3 months ago
Here is a snippet from the server logs. log-snippet.log
I think this will fix the issue: https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/1193187?usp=search
Patch updated. All handling of the gsh_refstr has been clarified. Documentation (comments and function documentation) has now been provided. I also cleaned up so code is used more in common and more clearly.
I tested patchset 2 locally and it does fix my issue.
I need to deliver a fix to my users quickly. Even if you submit this fix to V6-dev today, I am still using V5.6. If you submitted the fix to V5., we would still have to carefully import it into our build. With that in mind, I am thinking maybe I submit a local fix on my side while I give our developers more time to review your more extensive fix. So I have this question for you: does this fix look acceptable to you:
--- a/src/FSAL/commonlib.c
+++ b/src/FSAL/commonlib.c
@@ -3043,8 +3043,8 @@ static void set_op_context_export_fsal_n
bool discard_refstr)
{
if (discard_refstr) {
- gsh_refstr_put(op_ctx->ctx_fullpath);
- gsh_refstr_put(op_ctx->ctx_pseudopath);
+ if (op_ctx->ctx_fullpath != NULL) gsh_refstr_put(op_ctx->ctx_fullpath);
+ if (op_ctx->ctx_pseudopath != NULL) gsh_refstr_put(op_ctx->ctx_pseudopath);
}
op_ctx->ctx_export = exp;
@@ -3165,6 +3165,8 @@ void save_op_context_export_and_clear(st
op_ctx->ctx_export = NULL;
op_ctx->fsal_export = NULL;
op_ctx->ctx_pnfs_ds = NULL;
+ op_ctx->ctx_fullpath = NULL;
+ op_ctx->ctx_pseudopath = NULL;
}
void restore_op_context_export(struct saved_export_context *saved)
As a downstream only fix, that should fix the immediate issue. I do encourage changing to the more complete fix ASAP and perhaps only release this temporary fix in a fork, but all of that is up to you how to handle.
Changing to verified based on comment above.
Also, please feel free to add a Verified +1 to the Gerrithub review, and code review +1 if you are comfortable doing so.
Closing as done with 6.0 release.
I am getting an easily reproducible heap-use-after-free with my custom FSAL on MacOS.
Client and server are both on MacOS, the client is the MacOS kernel I am running with ganesha V5.6 My server only supports NFSv4 My fsal is configured with
lock_support=false
andlock_support_async_block=false
My fsal does not implementlock_op2
Please notice MacOS kernel sends compound request: PUTFH, RELEASE_LOCKOWNER (I'm not sure if it is really necessary to send PUTFH, but that's what darwin does)It strongly feels like refcounts gone wrong
I think what's used-after-free is either op_ctx->ctx_fullpath or op_ctx->ctx_pseudopath (or maybe fullpath / pseudopath?).
I suspect the issue is somewhere in release_lock_owner: https://github.com/nfs-ganesha/nfs-ganesha/blob/V5.6/src/SAL/nfs4_state.c#L708-L745