sahlberg / libnfs

NFS client library
Other
510 stars 200 forks source link

Add req_size and resp_size to PDU. #491

Closed linuxsmiths closed 3 weeks ago

linuxsmiths commented 4 weeks ago

These will be updated when we create the RPC request and when we receive the RPC response bytes, respectively. The req and resp size containg the RPC header, the NFS header and optional data bytes. This is for supporting mountstats style "avg bytes sent" and "avg bytes received".

User can query the req and resp bytes using the two newly added APIs

rpc_pdu_get_req_size() rpc_pdu_get_resp_size()

sahlberg commented 3 weeks ago

Not super excited about this. There are a couple of issue. It only returns the size so if we want to add additional data we would need to add additional functions for each type of data which does not scale well. It also only really work from within the context of the rpc callback function (which runs within the context of the worker thread for the rpc context) but will be racy if used anywhere else as rpc->pdu can change.

Please have a look at the stats-callback branch. I have proposed a mechanism to register a callback function used to report statistics data for each sent/received PDU. This uses a structure that can be expanded with additional fields without the need to add new functions to the api.

This allows applications to register to get a callback for every pdu with useful data about the pdu, such as size, direction, xid, which prodecure it is, whether it was successful or not and the response time in miliseconds.

Is this workable for your use case? I feel this is a more extensible approach.

sahlberg commented 3 weeks ago

I have implemented stats collections via a simple callback and an dedicated data structure we can extend when adding additional information that a consumer might be interested in.

See examples/nfs-stats-cb.c for an example on how to use it.

linuxsmiths commented 3 weeks ago

Took a look. I agree with your concern about needing to add more APIs, but this can be easily addressed by defining a stats structure which can be returned by a single API rpc_pdu_get_stats(struct rpc_pdu *). The stats structure can be extended w/o needing new APIs for new stats that we may want. The pdu in rpc_pdu_get_stats() means that these are per-request stats, as opposed to per-rpc_context stats which are returned by rpc_get_stats() w/o the pdu.

The other concern about the stats being valid only in the callback context, is IMO not so significant. With proper documentation this discipline is something user can easily maintain. Note that they would need the response stats only inside the callback, so that's the most natural place to call rpc_pdu_get_stats() for querying the response stats. Also, if you feel strongly about it, we can even add the stats structure pointer to rpc_cb, which would convey the per-request stats in the callback. This would cause more changes to the code hence I used the rpc_get_pdu() approach which expects discipline of using it only inside the callback.

The callback-model puts extra burden on the application to map the stats to his request to which the stats correspond. User has to maintain additional data structure to map the callback to his request, especially since the callback is set per rpc_context and not set for a request. Yes, we can call rpc_set_stats_cb() before every rpcnfs3_task() call, but that doesn't look neat either. So, I think it affects usability.

Additional benefit we get from the API approach that I propose is that lot of the context is known to the caller, so most of those fields direction/prog/vers/proc are not really needed.
There are some other problems with your patch, but those can be addressed, f.e., more than the time taken between issue and completion, what is more important is the time taken by the server. So we need to stamp the time when the request is sent not when it's queued. Also the unit of time should be at least usec (msec won't be good for fast servers). Also, more than "time taken" a timestamp would be more useful, so that caller can then use that to calculate stuff like, time spent in libnfs outque waiting to be dispatched, time taken by the server, etc.

In general, I find this will not be usable for my usecase in the current shape. I'll continue with the change in my fork for our current development.

Thanks for your inputs and help!

sahlberg commented 3 weeks ago

/*

/*

Then everytime you want the stats for the _task you just use this pattern:

rpc_*_task(... "your callback" ...)

Inside "your callback" you then just reference the content of my_stats. Use the fields you care about and ignore the rest.

That is functionally equivalent to calling a rpc_get_pdu_stats() function from the rpc callback function.

You could even use something similar for nfs*() functions, though since the nfs to rpc mapping is a 1-to-many mapping an nfs request would result in a series of multiple rpc calls. For that you would probably want to have a ringbuffer of rpc stat structures so that in the nfs_ callback you could extract the list of the rpc-calls that it mapped to.

linuxsmiths commented 3 weeks ago

my_stats cannot be a global or a per-rpc_context structure. It has to be a per-request structure. Since even the issue path stats callback will populate that which can be called by multiple application threads. Overall, I feel the callback approach for stats is not natural and intuitive.

sahlberg commented 3 weeks ago

You could use one my_stats for each rpc context. Every rpc_context has a dedicated service thread to do all socket I/O and to invoke rpc callbacks back into the application.

It doesn't matter how many application threads you have that call rpc_,,,_task(), they only masrshall the data and then enqueue it. Then sending the data, receiving the reply, populating rpc->pdu, and invoking the callback into the application is all done from the context of the service thread, in particular rpc_process_reply(). You can see that it always calls rpc->stats_cb with the content from the current PDU and also that it immediately then invokes pdu->cb and calls back into the application. This is all single threaded code so it will work exactly the same as a rpc_get_stats(rpc->pdu).

Regardless, I have added a function rpc_get_pdu_stats() that can extract the exact same data from the callback function. Please see current master.

linuxsmiths commented 3 weeks ago

There are two places where rpc->stats_cb() is called. One in the call direction and one in the reply direction. What you explain above applies to reply direction and I understand that all of the response processing happens in the context of the single service thread. I'm talking about the rpc->stats_cb() call in the request direction. That will not be called in the context of the servcie thread but in the context of the application thread that calls rpcnfs3*_task(), isn't it?

sahlberg commented 3 weeks ago

Yes, you are correct on that. The CALLs will be invoked when the packet is enqueued, in the application thread context.

The stats for CALLs are probably a lot less useful than the REPLIES so most applications, and probably your application would probably not be interested in CALLs at all. In the nfs-stats-cb.c example I am interested in both CALL and REPLY because that is more of an example of logging, log when things are enqueued and log when replies are received. That usecase exist but is not probably the most common one.

If you are not interested in any of the CALLS you can just have a

if (data->direction == CALL) { return; }

in the callback, or don't use callbacks at all and just use rpc_get_rpc_stats() from your rpc-callback. If you are just interested in getting the data for individual RPC replies then maybe using rpc_get_pdu_Stats() is the most convenient.

See it as there are two ways to get this data: 1, using the stats_cb callback. Use this if you want to aggregate all the CALLS/REPLIES in some data structure. 2, use rpc_get_pdu_stats() from your rpc-callback if you just want the stats for this particular PDU/REPLY.

sahlberg commented 3 weeks ago

We could change when the stats_cb callback is triggered for CALLs to instead be invoked when the PDU has been written to the socket, and thus invoke it from the service thread instead of the current application thread. That might make sense since we would then compute the pdu_Stats->response_time from the time between writing to the socket until the time we have received the reply instead of the current calculation which also includes the queuing time int he response times.

linuxsmiths commented 3 weeks ago

Sorry, have been caught up in some project deadlines. We are now getting close to code complete so one of the change is that now we will be working off the fork and only selectively merge fixes/change from sahlberg/libnfs branch. Also due to the tight deadlines, and as the two repos are slowly diverging, I'll not get time to create PRs against sahlberg/libnfs. I'll instead create PRs and merge against my repo linuxsmiths/libnfs. Request you to pick the changes that are of general interest to sahlberg/libnfs. Note that there might be many changes that might be specific to our use case. I'll let you know of changes which may be of general interest. Thanks.

sahlberg commented 3 weeks ago

Understood.

But please, if you can, if you do checkins that you think might be useful for general purpose please shoot me an email and I will evaluate and merge into my master branch. Good luck and I hope it works well for you.

I would LOVE to hear some actual performance numbers. i.e. How hard can you drive it and sustain transfers. At some stage I will take the RDMA code from libiscsi and translplant it into libnfs. In libiscsi RDMA we did eventually manage to get ~170k IOPS per core. Of course we can't get the same trhoughtput in nfs as it is a lot more expensive than iSCSI and the mapping protocol is pretty high overhead, but it would be interesting to see how hard we can go.

linuxsmiths commented 3 weeks ago

Definitely, I'd let you know about any changes that may be of general interest. Should I create a git issue or email? I don't think I've your email. Github doesn't show the email (at least I don't remember seeing it). I've been debugging couple of elusive bugs, related to timed out pdus. Once thing I personally prefer is to have lot of useful asserts to avoid such hard to find logic bugs. My changes will have lot of asserts, you can pick was you like. Also, I plan to add lot of instrumentation, again to track such hard to find bugs.

sahlberg commented 3 weeks ago

ronniesahlberg@gmail.com

No need for anything fancy or anything, just a "I just added something to my repo that might be useful, have a look" that is sufficient and would be appreciated

Also let me know if there are something you want me to add that would be useful to you. I can add it to my master and then you can backport to your stable/frozen branch.

linuxsmiths commented 2 weeks ago

Btw I've sent you multiple emails with PR links. Not sure if you saw them. Checking here since you didn't respond there.

sahlberg commented 2 weeks ago

I saw them. Thanks. I will look through them later today.

On Sat, 24 Aug 2024 at 03:11, linuxsmiths @.***> wrote:

Btw I've sent you multiple emails with PR links. Not sure if you saw them. Checking here since you didn't respond there.

— Reply to this email directly, view it on GitHub https://github.com/sahlberg/libnfs/pull/491#issuecomment-2307488511, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADY3EAE7RFHYVFTQPL6X7DZS5UKPAVCNFSM6AAAAABMNNB7Y2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBXGQ4DQNJRGE . You are receiving this because you modified the open/close state.Message ID: @.***>