hugelgupf / p9

Idiomatic Go 9P2000.L client and server, extracted from gVisor for general use
Apache License 2.0
91 stars 19 forks source link

Walk Server/Client interface inconsistency with 9P spec #56

Closed djdv closed 1 year ago

djdv commented 2 years ago

There's a problem with how twalk is being handled that forces File implementations to have non-compliant Walk methods.

walk(5) says:

It is legal for nwname to be zero, in which case newfid will represent the same file as fid and the walk will usually succeed; this is equivalent to walking to dot. The value of nwqid can-not be zero unless nwname is zero. Also, nwqid will always be less than or equal to nwname.

The Server handler for twalk internally requires that len(qids) returned from Walk(nil) is 1, despite the name count being 0. It then discards those qids, returning nil itself.

https://github.com/hugelgupf/p9/blob/ede3efe359430a2dbd9367d2ae768e990fad72b1/p9/handlers.go#L1129 calls: https://github.com/hugelgupf/p9/blob/ede3efe359430a2dbd9367d2ae768e990fad72b1/p9/handlers.go#L1147 calls: https://github.com/hugelgupf/p9/blob/ede3efe359430a2dbd9367d2ae768e990fad72b1/p9/handlers.go#L1041 checks length and only asserts count of 1 as valid (not 0): https://github.com/hugelgupf/p9/blob/ede3efe359430a2dbd9367d2ae768e990fad72b1/p9/handlers.go#L1014-L1017 despite requiring len(qids) == 1 qids are dropped: https://github.com/hugelgupf/p9/blob/ede3efe359430a2dbd9367d2ae768e990fad72b1/p9/handlers.go#L1072-L1075

This answers the "TODO: why?", but this requirement causes an inconsistency in returned values from the same File implementation. When called directly, a File will return [1]qid{...}, but when wrapped by Server the same arguments will return nil. Implying that your implementation can either be compliant with Server's expectations, or compliant when called directly, but not both. I.e. qids from Client.Walk(nil) and File.Walk(nil) will not be consistent despite being the same underlying implementation.

djdv commented 2 years ago

I tried this out: https://github.com/hugelgupf/p9/compare/main...djdv:p9:fix/handler-compliance https://github.com/hugelgupf/p9/compare/main...8434a96cc9389b334bd998957123aa2314126ab8 and it doesn't seem to cause any issues with the library Server and Client, as well as when mounted via v9fs. But my testing was a bit hasty.

I can submit a PR early if this is all seems correct, but for now I'm going to continue building on it to see if something bad happens during more testing. Then submit one anyway if I get no problems.

Feedback welcomed in the meantime. This was just quick hacks. I'm particularly concerned if https://github.com/hugelgupf/p9/commit/8434a96cc9389b334bd998957123aa2314126ab8 is correct. Since previously this was forbidden via checkSafeName but that may have been accidental/incidental?

Edit: 8c6721f89987606df712c910f3945ffc8589b795 seems fine, however 8434a96cc9389b334bd998957123aa2314126ab8 seems incorrect. walk(5) explicitly says

The name . (dot), meaning the current directory, is not used in the protocol.

I got tripped up by the preceding

is equivalent to walking to dot

assuming a walk message containing dot would be valid, but it seems like such a message should never be generated.

hugelgupf commented 1 year ago

So I tracked down why the change was introduced. (It's unfortunately pre-gVisor open sourcing, so I can't share the full commit.)

The Linux client expects that when len(name) == 0, the returned len(qid) == 0 as well: https://elixir.bootlin.com/linux/v6.4.10/source/net/9p/client.c#L1184

djdv commented 1 year ago

Thanks for looking into it.

The Linux client and the p9.Client do conform to the spec, but the inconsistency is that p9.File implementations must violate that part of the spec by returning 1 QID from Walk(nil)/wnames == 0, otherwise the handler will reject it.

Which means calling the same method on the same p9.File will have different return values depending on if you call it directly, or call it when wrapped in the p9.Client.

I hacked something together real quick to demonstrate this: (only run and Walk are the focus here) https://gist.github.com/djdv/682b9d9fd8a58c43d9320dc220a90705

The output without the patch:

spec compliance `false`:
direct call return:  [QID{Type: 128, Version: 0, Path: 1}]
client response:  []

spec compliance `true`:
direct call return:  []
client error: invalid argument

The output with the patch:

spec compliance `false`:
direct call return:  [QID{Type: 128, Version: 0, Path: 1}]
client response:  []

spec compliance `true`:
direct call return:  []
client response:  []

As-is, you can't implement a p9.File that is spec compliant when its method is called directly, only when it's wrapped as a Client. Or the opposite (whichever compliance you choose is mutually exclusive with the other).

The patch amends this in a compatible way by accepting 0 QIDs when wnames is 0, in addition to the current behaviour of accepting 1 QID (then dropping it so the reply actually has 0).

I noticed this because I have helper functions that accept a p9.File, and sometimes server-side I have a direct instance of a p9.File which means the helper is calling its methods directly. Other times I might call the same helper with a remote Client file. Since the API/behaviour is slightly different (despite the interface being the same), it can cause issues (and it confused me).

hugelgupf commented 1 year ago

Alright, I get it. Let's take the patch. Makes sense.

This'd actually make implementing regular files (or other non-directories) easier, since they can have a default-Walk implementation, too.