Closed djdv closed 1 year ago
I tried this out: https://github.com/hugelgupf/p9/compare/main...djdv:p9:fix/handler-compliance https://github.com/hugelgupf/p9/compare/main...8434a96cc9389b334bd998957123aa2314126ab8 and it doesn't seem to cause any issues with the library Server
and Client
, as well as when mounted via v9fs.
But my testing was a bit hasty.
I can submit a PR early if this is all seems correct, but for now I'm going to continue building on it to see if something bad happens during more testing. Then submit one anyway if I get no problems.
Feedback welcomed in the meantime. This was just quick hacks.
I'm particularly concerned if https://github.com/hugelgupf/p9/commit/8434a96cc9389b334bd998957123aa2314126ab8 is correct. Since previously this was forbidden via checkSafeName
but that may have been accidental/incidental?
Edit: 8c6721f89987606df712c910f3945ffc8589b795 seems fine, however 8434a96cc9389b334bd998957123aa2314126ab8 seems incorrect. walk(5) explicitly says
The name
.
(dot), meaning the current directory, is not used in the protocol.
I got tripped up by the preceding
is equivalent to walking to dot
assuming a walk message containing dot would be valid, but it seems like such a message should never be generated.
So I tracked down why the change was introduced. (It's unfortunately pre-gVisor open sourcing, so I can't share the full commit.)
The Linux client expects that when len(name) == 0, the returned len(qid) == 0 as well: https://elixir.bootlin.com/linux/v6.4.10/source/net/9p/client.c#L1184
Thanks for looking into it.
The Linux client and the p9.Client
do conform to the spec, but the inconsistency is that p9.File
implementations must violate that part of the spec by returning 1 QID from Walk(nil)
/wnames == 0
, otherwise the handler will reject it.
Which means calling the same method on the same p9.File
will have different return values depending on if you call it directly, or call it when wrapped in the p9.Client
.
I hacked something together real quick to demonstrate this:
(only run
and Walk
are the focus here)
https://gist.github.com/djdv/682b9d9fd8a58c43d9320dc220a90705
The output without the patch:
spec compliance `false`:
direct call return: [QID{Type: 128, Version: 0, Path: 1}]
client response: []
spec compliance `true`:
direct call return: []
client error: invalid argument
The output with the patch:
spec compliance `false`:
direct call return: [QID{Type: 128, Version: 0, Path: 1}]
client response: []
spec compliance `true`:
direct call return: []
client response: []
As-is, you can't implement a p9.File
that is spec compliant when its method is called directly, only when it's wrapped as a Client
. Or the opposite (whichever compliance you choose is mutually exclusive with the other).
The patch amends this in a compatible way by accepting 0
QIDs when wnames is 0
, in addition to the current behaviour of accepting 1
QID (then dropping it so the reply actually has 0
).
I noticed this because I have helper functions that accept a p9.File
, and sometimes server-side I have a direct instance of a p9.File
which means the helper is calling its methods directly.
Other times I might call the same helper with a remote Client
file.
Since the API/behaviour is slightly different (despite the interface being the same), it can cause issues (and it confused me).
Alright, I get it. Let's take the patch. Makes sense.
This'd actually make implementing regular files (or other non-directories) easier, since they can have a default-Walk implementation, too.
There's a problem with how twalk is being handled that forces
File
implementations to have non-compliantWalk
methods.walk(5) says:
The
Server
handler fortwalk
internally requires thatlen(qids)
returned fromWalk(nil)
is 1, despite the name count being 0. It then discards thoseqids
, returningnil
itself.https://github.com/hugelgupf/p9/blob/ede3efe359430a2dbd9367d2ae768e990fad72b1/p9/handlers.go#L1129 calls: https://github.com/hugelgupf/p9/blob/ede3efe359430a2dbd9367d2ae768e990fad72b1/p9/handlers.go#L1147 calls: https://github.com/hugelgupf/p9/blob/ede3efe359430a2dbd9367d2ae768e990fad72b1/p9/handlers.go#L1041 checks length and only asserts count of 1 as valid (not 0): https://github.com/hugelgupf/p9/blob/ede3efe359430a2dbd9367d2ae768e990fad72b1/p9/handlers.go#L1014-L1017 despite requiring
len(qids) == 1
qids are dropped: https://github.com/hugelgupf/p9/blob/ede3efe359430a2dbd9367d2ae768e990fad72b1/p9/handlers.go#L1072-L1075This answers the "TODO: why?", but this requirement causes an inconsistency in returned values from the same
File
implementation. When called directly, aFile
will return[1]qid{...}
, but when wrapped byServer
the same arguments will returnnil
. Implying that your implementation can either be compliant withServer
's expectations, or compliant when called directly, but not both. I.e.qids
fromClient.Walk(nil)
andFile.Walk(nil)
will not be consistent despite being the same underlying implementation.