Closed liyinshubyte closed 3 months ago
Yes, that's another thing we need to work on. Ideally we would just point the write data at the original buffer.
And at some point, we need to support a vector of buffers (the FSAL I/O code hand;es vectors).
@ffilz Thanks, I will work on it.
We're going to prioritize working on this for V6, have you done anything with it?
@ffilz sorry, I still have no available time on this, you can continue to work on this.
There are also extraneous data copies in the READ path.
For the READ path, we could re-structure the READ response to allow incorporating buffers passed from the back end filesystem. Doing so would require care that the buffers not be modified (which means a data copy WOULD become necessary if we are doing krb5p since we can't then encrypt in place). But the solution should make it clear if the buffer passed to RPC can be modified or not, so krb5p only copies if the buffer is read-only. Ultimately, the XDR encoded response would become an iov that includes the back end filesystem buffer for the READ data.
One the WRITE side, we will get the request into an iov and should decode that into filling in the request structure but creating an iov that maps the WRITE data chunks from the original iov the request was received by RPC into. Then these buffers should be passed all the way to the back end filesystem as an iov and preferably no copy done on the way there.
The one challenge is that if the physical I/O on either end requires using hardware buffers, we might have to do data copies and it may be tricky and undesirable to allow Ganesha to "own" the hardware buffers during processing. But outside copy to/from hardware buffers we should be able to eliminate any other data copies outside a need to copy for encryption (fortunately we already are structured such that integrity with krb5i does not require a copy, we checksum in place and put the checksum in a separate buffer in the iov).
Closing as done with 6.0 release.
When I use ganesha to write data, if the backend write slowly, ganesha will exhaust the thread number, and memory will be high which main come from the write data. For example, if max thread number is 10000, each write request has 1MB data, the write data memory is 10GB when all thread is busy. But I find the memory of write data is more than 20GB.
The reason is there are two copy memory of write data, the first comes from: svc_rqst_xprt_task_recv->svc_vc_recv->xdr_ioq_uv_create->gsh_malloc, the second comes from: svc_rqst_xprt_task_recv->svc_request->nfs_rpc_process_request->xdr_COMPOUND4args->xdr_array_decode->xdr_nfs_argop4->xdr_WRITE_SAME4args->xdr_bytes_decode->gsh_malloc. The two copy memory will be freed when finish compound, but I think the first copy memory could be freed when finish xdr_bytes_decode, then we will save about 10GB memory if the write request is slow.