trezor / trezord-go

:smiling_imp: Trezor Communication Daemon (written in Go)
GNU Lesser General Public License v3.0
244 stars 146 forks source link

Allow /post when /read is in progress #183

Closed tsusanka closed 4 years ago

tsusanka commented 4 years ago

In this PR I would like to document the introduced changes. In #130 I would like to then provide additional motivation and documentation for our future selves. I will do that after we agree on this PR.


This PR has three commits:

I will of course squash before merge.


Also this does not change 95ff303445fa3ddb26e9e06d45cefbe6e17423f9, which is already in master. That commit omits the check for "other call on same session" for /post. The same thing might be introduced for /read, but since we do not need it at the moment I did not implement it.

In future, we should consider doing a bit of refactoring and make /call basically an alias for /post and then /read.


I am glad for any comments from anyone. @matejcik could maybe have a quick look?

matejcik commented 4 years ago

I don't think completely removing the transfer mutex is correct.

To summarize: the old code would allow only one pending call to lowlevel.Interrupt_Transfer, regardless of direction. AFAICT from the libusb doc, transfer direction is determined by the endpoint being used.

Comments on LibUSBDevice suggest that only one interrupt_transfer should ever be running at the same time, but the existence of debugTransferMutex suggests that different endpoints are OK for this. So calling on a read endpoint and a write endpoint simultaneously should be OK.

What worries me is that the new code enforces the "no two interrupt transfers" at a higher level. Hypothetically you could sidestep the locks by calling Core.writeDev or readDev directly, or any of the lower level methods.

A better design would be to have a per-endpoint mutex in the libusb code - perhaps modify transferMutexLock so that it takes the endpoint id as a parameter, instead of just debug: bool? And have four mutexes instead of two, or an array indexed by the endpoint.

tsusanka commented 4 years ago

This has simplified that code significantly, so I have force-pushed.

As @matejcik suggested I have added two more locks for libusb. I am passing a pointer to the appropriate mutex from the read/write functions, which seemed as the cleanest solution.

matejcik commented 4 years ago

I don't think removing the high-level locks is correct either.

IMHO the code as of 0b6d02d is correct in the practical sense: as long as nobody does any further development on trezord, it will work and lock properly. The low-level locks are a "code safety" measure, as I see it. (utACK on correctness of 9ca0991 for that)

A possible problem now is that sending a large message to Trezor is going to take multiple calls to Write (and thus readWrite), because each call writes exactly 64 bytes. With /post not being high-level locked, two writers on the same session might end up interleaving each other's packets.

To summarize:

In all cases it is OK to leave the caller waiting on a mutex until the previous caller is done. In addition, there is no usecase for allowing simultaneous calls to /read (and thus /call) in the first place. The current behavior is sufficient to implement "retry until success" strategy on the client. (...except maybe something like "shut down the other caller", which ... is a usecase, but probably not one we are interested in)

(this is different for /post, which must not wait for a pending "message read", otherwise canceling messages won't work)

tsusanka commented 4 years ago

Very good points. For me the most important part is:

The low-level locks are a "code safety" measure, as I see it.


So in a summary:

  • two Interrupt_Transfers on the same endpoint should not run concurrently - otherwise presumably libusb crashes or something.

Is achieved (or rather improved) by https://github.com/trezor/trezord-go/pull/183/commits/9ca09919502aef29b2a1e509c7a432b9cf8acf5b.

  • two "message writes" and "message reads" (what /post and /read does, respectively) should not run concurrently - otherwise we lose packet ordering.

Will be done by re-introducing https://github.com/trezor/trezord-go/commit/8e97c7d4c71c82b2f3767a912e154c0adbb490ad.

Two /calls should not run concurrently - otherwise the "message writes" and "message reads" could interleave and the second caller would read the first response.

This is solved by the fact that no two /calls are allowed at the same time - see: https://github.com/trezor/trezord-go/blob/05991cea5900d18bcc6ece5ae5e319d138fc5551/core/core.go#L557

This should be enough. What do you think?

matejcik commented 4 years ago

agreed on all counts

matejcik commented 4 years ago

As discussed in a side-channel: We could also add a separate mutex for /call to strenghten the protection. We probably won't though.

For one, this would change behavior. Instead of rejecting the /call, bridge would wait until the previous caller is done. This is potentially a long time, because it is waiting for a response from Trezor. (that is also a point for rejecting simultaneous calls to /call and /read: makes it harder to write incorrect clients that don't keep track of their requests)

For two, this isn't a feature we would be interested in. Sequential /calls would allow for message batching, but (1) we don't need that for performance, because call latency isn't the bottleneck; and furthermore the messages would get stuck in the USB buffers anyway; (2) it doesn't match Trezor communication model too well, most operations are interactive; (3) if a client really wants batching, nothing is stopping them from implementing it with /post and /read.

tsusanka commented 4 years ago

Done as discussed. Please check the comment in ba94c79 especially I am semi-happy about it.

tsusanka commented 4 years ago

I will merge this after Szymon tests this (I've tested it myself, but just to make sure).

tsusanka commented 4 years ago

Merged.