Closed tsusanka closed 4 years ago
I don't think completely removing the transfer mutex is correct.
To summarize: the old code would allow only one pending call to lowlevel.Interrupt_Transfer
, regardless of direction. AFAICT from the libusb doc, transfer direction is determined by the endpoint being used.
Comments on LibUSBDevice suggest that only one interrupt_transfer should ever be running at the same time, but the existence of debugTransferMutex
suggests that different endpoints are OK for this.
So calling on a read endpoint and a write endpoint simultaneously should be OK.
What worries me is that the new code enforces the "no two interrupt transfers" at a higher level. Hypothetically you could sidestep the locks by calling Core.writeDev
or readDev
directly, or any of the lower level methods.
A better design would be to have a per-endpoint mutex in the libusb code - perhaps modify transferMutexLock
so that it takes the endpoint id as a parameter, instead of just debug: bool? And have four mutexes instead of two, or an array indexed by the endpoint.
This has simplified that code significantly, so I have force-pushed.
As @matejcik suggested I have added two more locks for libusb. I am passing a pointer to the appropriate mutex from the read/write functions, which seemed as the cleanest solution.
I don't think removing the high-level locks is correct either.
IMHO the code as of 0b6d02d is correct in the practical sense: as long as nobody does any further development on trezord, it will work and lock properly. The low-level locks are a "code safety" measure, as I see it. (utACK on correctness of 9ca0991 for that)
A possible problem now is that sending a large message to Trezor is going to take multiple calls to Write
(and thus readWrite
), because each call writes exactly 64 bytes. With /post
not being high-level locked, two writers on the same session might end up interleaving each other's packets.
To summarize:
Interrupt_Transfer
s on the same endpoint should not run concurrently - otherwise presumably libusb crashes or something./post
and /read
does, respectively) should not run concurrently - otherwise we lose packet ordering./call
s should not run concurrently - otherwise the "message writes" and "message reads" could interleave and the second caller would read the first response.In all cases it is OK to leave the caller waiting on a mutex until the previous caller is done.
In addition, there is no usecase for allowing simultaneous calls to /read
(and thus /call
) in the first place. The current behavior is sufficient to implement "retry until success" strategy on the client.
(...except maybe something like "shut down the other caller", which ... is a usecase, but probably not one we are interested in)
(this is different for /post
, which must not wait for a pending "message read", otherwise canceling messages won't work)
Very good points. For me the most important part is:
The low-level locks are a "code safety" measure, as I see it.
So in a summary:
- two Interrupt_Transfers on the same endpoint should not run concurrently - otherwise presumably libusb crashes or something.
Is achieved (or rather improved) by https://github.com/trezor/trezord-go/pull/183/commits/9ca09919502aef29b2a1e509c7a432b9cf8acf5b.
- two "message writes" and "message reads" (what /post and /read does, respectively) should not run concurrently - otherwise we lose packet ordering.
Will be done by re-introducing https://github.com/trezor/trezord-go/commit/8e97c7d4c71c82b2f3767a912e154c0adbb490ad.
Two /calls should not run concurrently - otherwise the "message writes" and "message reads" could interleave and the second caller would read the first response.
This is solved by the fact that no two /calls are allowed at the same time - see: https://github.com/trezor/trezord-go/blob/05991cea5900d18bcc6ece5ae5e319d138fc5551/core/core.go#L557
This should be enough. What do you think?
agreed on all counts
As discussed in a side-channel: We could also add a separate mutex for /call
to strenghten the protection. We probably won't though.
For one, this would change behavior. Instead of rejecting the /call
, bridge would wait until the previous caller is done. This is potentially a long time, because it is waiting for a response from Trezor.
(that is also a point for rejecting simultaneous calls to /call
and /read
: makes it harder to write incorrect clients that don't keep track of their requests)
For two, this isn't a feature we would be interested in. Sequential /call
s would allow for message batching, but (1) we don't need that for performance, because call latency isn't the bottleneck; and furthermore the messages would get stuck in the USB buffers anyway; (2) it doesn't match Trezor communication model too well, most operations are interactive; (3) if a client really wants batching, nothing is stopping them from implementing it with /post
and /read
.
Done as discussed. Please check the comment in ba94c79 especially I am semi-happy about it.
I will merge this after Szymon tests this (I've tested it myself, but just to make sure).
Merged.
In this PR I would like to document the introduced changes. In #130 I would like to then provide additional motivation and documentation for our future selves. I will do that after we agree on this PR.
This PR has three commits:
Force()
method in theUSBDevice
interface in order to omit the TransferMutexLock in libusb, see https://github.com/trezor/trezord-go/commit/7a1d9858e439a841b10c66550e2e4c7571fd5a52#diff-da4fb7299e95c07f8f33934faef66777R535.I will of course squash before merge.
Also this does not change 95ff303445fa3ddb26e9e06d45cefbe6e17423f9, which is already in
master
. That commit omits the check for "other call on same session" for /post. The same thing might be introduced for /read, but since we do not need it at the moment I did not implement it.In future, we should consider doing a bit of refactoring and make /call basically an alias for /post and then /read.
I am glad for any comments from anyone. @matejcik could maybe have a quick look?