ydb-platform / nbs

Network Block Store
Apache License 2.0
52 stars 19 forks source link

[Filestore] non-blocking open/close #1541

Open qkrorlqr opened 1 month ago

qkrorlqr commented 1 month ago

open (CreateHandle) and close (DestroyHandle) do the following expensive things (taking DestroyHandle as an example):

Both operations make modifications to the index stored persistently in blobstorage. There is a way to actually make these modifications in background and ack the corresponding syscall to the guest quickly. The open call is a bit more complex - I won't describe the solution for it right now (but I have some ideas), but the close call seems to be simple to implement: we can simply ack the call straight away and issue the cleanup (handle deletion and possibly node deletion) in the background. If this process fails, we should retry it from the client. If the client dies, client session dies together with the client after a timeout of inactivity and the cleanup happens automatically.

qkrorlqr commented 1 month ago

Currently we use the client's virtio queue as our redo log. If we ack the request before its actual completion, we need another redo log. We can create a log file on top of tmpfs local to filestore-vhost. open/close request processing may look like this:

  1. Send CreateHandleStage1/DestroyHandleStage1 to the tablet which would perform basic checks over the tablet's inmemory state and return a preliminary result (including HandleId for CreateHandle)
  2. If the preliminary result is OK, we will write this result to the redo log
  3. We respond to the client
  4. We send CreateHandleStage2/DestroyHandleStage2 requests which wait for the actual completion of the operation and retry those requests upon errors
  5. We should delete the corresponding op from the redo log upon Stage2 success

If filestore-vhost restarts, it should reread its redo log to continue retrying the operations. The handles are local to the client so no other client can interfere with them and cause nonretriable errors between Stage1 and Stage2.

Notes:

qkrorlqr commented 1 week ago

In case the client's opening a new file with O_CREAT flag, we can do it in an async way as well (after checking that it's actually possible to open the file) and queue the writes if the file is not yet created at the moment when the writes arrive. If there is a race which makes the initial check pass but prevents us from creating the file, we can respond to those writes with an io error. It's not a universally correct solution of course so we can keep this logic under a feature flag.