NBD server gets stuck after writing to the exported device using NBD_CMD_WRITE

gaborigloi commented 7 years ago

Unlike the handler of the Read command in loop in lib/server.ml, the code that handles the Write command does not call loop recursively to continue processing requests. As a result, the server gets stuck after the write command, and the client will hang if it sends any further commands. Both version v2.1.3 that we currently use in xs-opam, and the most recent v3.0.0 version are affected.

This issue can be fixed by replacing this line:

            else ok t handle None in

with this:

            else ok t handle None >>= fun () -> loop () in

(Maybe we should also call the loop recursively in the invalid request case, it depends on what the NBD protocol dictates.)

mseri commented 7 years ago

Can you make a PR for 2.1.3 and 3?

gaborigloi commented 7 years ago

@mseri Sure. I think we'll need a branch for the 2.X versions of nbd, something like bugfix-v2.x, because currently it doesn't have a branch.

I'm unsure whether we should terminate the session or continue in case of an invalid command request type. The protocol says in the "Terminating the transmission phase" section:

Either side MAY initiate a hard disconnect if it detects a violation by the other party of a mandatory condition within this document.

The protocol also says in the "Error values" section, that

The server SHOULD return EINVAL if it receives an unknown command.

The server SHOULD return EINVAL if it receives an unknown command flag. It also SHOULD return EINVAL if it receives a request with a flag not explicitly documented as applicable to the given request.

However, where it documents the request types, it does not say that "the client MUST NOT send any other requet types", but maybe it's assumed implicitly.

So I'm just going to keep the current behaviour and terminate the session in case of an invalid request type, that's probably safer. But then we should initiate a hard disconnect, instead of leaving the socket open. The other option is to just continue processing requests after an invalid request.

Another thing that is incorrect about the current implementation (v2.1.3 at least) is that it will send an EINVAL error in response to the NBD_CMD_DISC command (which is always valid command), because it does not handle this request in the match. But I think as @thomassa noted, the NBD server does not even close any connections at the moment.

thomassa commented 7 years ago

That's right: the existing code never calls channel.close. In my work-in-progress branch https://github.com/thomassa/nbd/commits/tlschannel I have a commit that fixes this for failure cases, calling channel.close which implements a hard disconnect: https://github.com/thomassa/nbd/commit/7f128d467a97f07b4630b8e3d75d3156cdb99fdf

thomassa commented 7 years ago

I think if the server gets a request it doesn't recognise, then it's reasonable to keep the connection open after sending EINVAL, since an unrecognised request is not a "violation of a mandatory condition". If the protocol is extended in future, then doing a hard disconnect in response to an unrecognised request would cause compatibility difficulties for future clients that implement the extension.

gaborigloi commented 7 years ago

@thomassa But I'm afraid that we would then incorrectly parse the unrecognized request, for example miss the data if it has any, and then get out of sync with the client and try to parse the data of that unrecognized request as the next NBD command, and fail, which is really bad. This is why I think that it's safer to just reset the connection. (And we could consider only sending the supported request types as a mandatory condition.)

xapi-project / nbd

NBD server gets stuck after writing to the exported device using NBD_CMD_WRITE #41