trezor / trezord-go

:smiling_imp: Trezor Communication Daemon (written in Go)
GNU Lesser General Public License v3.0
244 stars 146 forks source link

Q: `Release` interaction with `Call` #234

Open RealKeyboardWarrior opened 2 years ago

RealKeyboardWarrior commented 2 years ago

I do not see the Release function acquiring locks for these mutexes so it seems like we'd be able to release a session whilst it's busy reading and writing? Perhaps I'm missing some libusb magic locking mechanisms here. https://github.com/trezor/trezord-go/blob/106e5e9af3573ac2b27ebf2082bbee91650949bf/core/core.go#L354-L372

There are locks on the session maps, but those only apply to adding and removing values within the map.

prusnak commented 2 years ago

Maybe @matejcik and @szymonlesisz can comment on this? Is this desirable to be able to call Release while a Call is still running?

RealKeyboardWarrior commented 2 years ago

Hi,

The readMutex is being acquired here through finishReadQueue: https://github.com/trezor/trezord-go/blob/106e5e9af3573ac2b27ebf2082bbee91650949bf/usb/libusb.go#L435-L456

But it seems to try interrupting the sending without acquiring a mutex.

matejcik commented 2 years ago

I wouldn't rely on any libusb magic, it's generally not cross-OS reliable.

Releasing while a call is in progress seems like asking for trouble, but an usecase I can envision goes like this: App A is waiting for user interaction on Trezor, or got otherwise stuck, and it's not visible to the user, so it's not apparent what they should do. App B is sitting at the "Another app is using Trezor screen, steal focus?" and wants to be able to shut out App A.

If we disallow releasing while a call is in progress, the user can only resolve this by reconnecting the Trezor.

Also how would this interact with using Post/Read instead of Call ?

RealKeyboardWarrior commented 2 years ago

I've been trying to simulate two applications attempting to race the bridge at the same time, one is racing to GetXPUB for 44'/0'/0' and another is for GetXPUB 44'/0'/1', and it attempts to detect a case where an unexpected XPub message is returned, indicating that there might be stale data being sent back to us.

It seems that the bridge doesn't handle multiple applications well when they are aggressively racing to acquire a handle to the device. If two applications are aggressively calling the bridge they will return either session not found or wrong previous session. Neither application will be able to use the device if one of them is retrying instantly.

The bridge version that I'm using is 2.0.27 downloaded from https://suite.trezor.io/web/bridge/, on Debian 10.

Additionally to the behavior above, I've managed to produce two strange messages that might indicate funky stuff going on.

Use before acquiring or after releasing

It seems like I've managed to produce a scenario where the bridge attempts to use the device before having claimed the device, or perhaps after it had released its claim.

Dec 17 11:45:09 dev kernel: usb 1-3: usbfs: process 8090 (trezord) did not claim interface 0 before use

Bigger log:

Dec 17 11:45:07 dev trezord[8090]: POST /call/468
Dec 17 11:45:07 dev trezord[8090]: POST /acquire/1/468
Dec 17 11:45:07 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:07 +0100] "POST /call/468 HTTP/1.1" 400 46
Dec 17 11:45:07 dev trezord[8090]: POST /acquire/1/null
Dec 17 11:45:08 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:07 +0100] "POST /acquire/1/468 HTTP/1.1" 200 18
Dec 17 11:45:08 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:07 +0100] "POST /acquire/1/null HTTP/1.1" 400 35
Dec 17 11:45:08 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:07 +0100] "POST /listen HTTP/1.1" 200 95
Dec 17 11:45:08 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:07 +0100] "POST /listen HTTP/1.1" 200 95
Dec 17 11:45:08 dev trezord[8090]: POST /call/469
Dec 17 11:45:08 dev trezord[8090]: POST /listen
Dec 17 11:45:08 dev trezord[8090]: POST /acquire/1/null
Dec 17 11:45:08 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:08 +0100] "POST /acquire/1/null HTTP/1.1" 400 35
Dec 17 11:45:08 dev trezord[8090]: POST /listen
Dec 17 11:45:08 dev trezord[8090]: POST /acquire/1/469
Dec 17 11:45:09 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:08 +0100] "POST /acquire/1/469 HTTP/1.1" 200 18
Dec 17 11:45:09 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:08 +0100] "POST /listen HTTP/1.1" 200 95
Dec 17 11:45:09 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:08 +0100] "POST /listen HTTP/1.1" 200 95
Dec 17 11:45:09 dev kernel: usb 1-3: usbfs: process 8090 (trezord) did not claim interface 0 before use
Dec 17 11:45:09 dev trezord[8090]: POST /listen
Dec 17 11:45:09 dev trezord[8090]: POST /call/470
Dec 17 11:45:09 dev trezord[8090]: POST /listen
Dec 17 11:45:09 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:08 +0100] "POST /call/469 HTTP/1.1" 400 26
Dec 17 11:45:09 dev trezord[8090]: POST /acquire/1/470
Dec 17 11:45:09 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:09 +0100] "POST /call/470 HTTP/1.1" 400 46
Dec 17 11:45:09 dev trezord[8090]: POST /acquire/1/470
Dec 17 11:45:10 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:09 +0100] "POST /acquire/1/470 HTTP/1.1" 400 35
Dec 17 11:45:10 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:09 +0100] "POST /acquire/1/470 HTTP/1.1" 200 18
Dec 17 11:45:10 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:09 +0100] "POST /listen HTTP/1.1" 200 95
Dec 17 11:45:10 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:09 +0100] "POST /listen HTTP/1.1" 200 95
Dec 17 11:45:10 dev trezord[8090]: POST /call/471
Dec 17 11:45:10 dev trezord[8090]: POST /acquire/1/470
Dec 17 11:45:10 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:10 +0100] "POST /acquire/1/470 HTTP/1.1" 400 35
Dec 17 11:45:10 dev trezord[8090]: POST /listen
Dec 17 11:45:10 dev trezord[8090]: POST /listen
Dec 17 11:45:10 dev trezord[8090]: POST /acquire/1/471
Dec 17 11:45:10 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:10 +0100] "POST /call/471 HTTP/1.1" 400 46
Dec 17 11:45:10 dev trezord[8090]: POST /acquire/1/471
Dec 17 11:45:11 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:10 +0100] "POST /acquire/1/471 HTTP/1.1" 200 18
Dec 17 11:45:11 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:10 +0100] "POST /acquire/1/471 HTTP/1.1" 400 35

Strange disconnect war

Application 1 racing to get 44'/0'/0', the first few attempts work fine when the other application isn't racing. This might have happened at the same time as the previous issue above but I can't know because I forgot to get a detailed dump.

BIP39 passphrase: 
44'/0'/0': attempt 0
44'/0'/0': attempt 1
44'/0'/0': attempt 2
44'/0'/0': attempt 3
44'/0'/0': attempt 4
44'/0'/0': attempt 5
Error: session not found
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
44'/0'/0': attempt 31
44'/0'/0': attempt 32
44'/0'/0': attempt 33

Application 2 racing to get 44'/0'/1'

connection detected, setting descriptor:
{
  path: '1',
  session: '15',
  debugSession: null,
  product: 21441,
  vendor: 4617,
  debug: false
}
undefined
Error: session not found
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: session not found
Error: wrong previous session
Error: session not found
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session

Lots and lots of USB resetting going on when fighting over sessions

Lots and lots of resetting of the USB device..

Dec 17 11:00:00 dev kernel: usbcore: registered new interface driver usbfs
Dec 17 11:00:00 dev kernel: usbcore: registered new interface driver hub
Dec 17 11:00:00 dev kernel: usbcore: registered new device driver usb
Dec 17 11:00:00 dev kernel: usb usb1: New USB device found, idVendor=1d6b, idProduct=0001, bcdDevice= 4.19
Dec 17 11:00:00 dev kernel: usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
Dec 17 11:00:00 dev kernel: usb usb1: Product: OHCI PCI host controller
Dec 17 11:00:00 dev kernel: usb usb1: Manufacturer: Linux 4.19.0-17-amd64 ohci_hcd
Dec 17 11:00:00 dev kernel: usb usb1: SerialNumber: 0000:00:06.0
Dec 17 11:00:00 dev kernel: usb 1-1: new full-speed USB device number 2 using ohci-pci
Dec 17 11:00:00 dev kernel: usb 1-1: New USB device found, idVendor=80ee, idProduct=0021, bcdDevice= 1.00
Dec 17 11:00:00 dev kernel: usb 1-1: New USB device strings: Mfr=1, Product=3, SerialNumber=0
Dec 17 11:00:00 dev kernel: usb 1-1: Product: USB Tablet
Dec 17 11:00:00 dev kernel: usb 1-1: Manufacturer: VirtualBox
Dec 17 11:00:00 dev kernel: usbcore: registered new interface driver usbhid
Dec 17 11:00:00 dev kernel: usbhid: USB HID core driver
Dec 17 11:00:00 dev kernel: input: VirtualBox USB Tablet as /devices/pci0000:00/0000:00:06.0/usb1/1-1/1-1:1.0/0003:80EE:0021.0001/input/input6
Dec 17 11:00:00 dev kernel: hid-generic 0003:80EE:0021.0001: input,hidraw0: USB HID v1.10 Mouse [VirtualBox USB Tablet] on usb-0000:00:06.0-1/input0
Dec 17 11:00:00 dev kernel: usb 1-2: new full-speed USB device number 3 using ohci-pci
Dec 17 11:00:00 dev kernel: usb 1-2: New USB device found, idVendor=27c6, idProduct=533c, bcdDevice= 1.00
Dec 17 11:00:00 dev kernel: usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Dec 17 11:00:00 dev kernel: usb 1-2: Product: FingerPrint
Dec 17 11:00:00 dev kernel: usb 1-2: Manufacturer: Goodix
Dec 17 11:00:00 dev mtp-probe[419]: checking bus 1, device 2: "/sys/devices/pci0000:00/0000:00:06.0/usb1/1-1"
Dec 17 11:00:00 dev mtp-probe[412]: checking bus 1, device 3: "/sys/devices/pci0000:00/0000:00:06.0/usb1/1-2"
Dec 17 11:03:39 dev kernel: usb 1-3: new full-speed USB device number 4 using ohci-pci
Dec 17 11:03:40 dev kernel: usb 1-3: New USB device found, idVendor=1209, idProduct=53c1, bcdDevice= 2.00
Dec 17 11:03:40 dev kernel: usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Dec 17 11:03:40 dev kernel: usb 1-3: Product: TREZOR
Dec 17 11:03:40 dev kernel: usb 1-3: Manufacturer: SatoshiLabs
Dec 17 11:03:40 dev kernel: usb 1-3: SerialNumber: 8F552B0716DC9AA5F89A8CA4
Dec 17 11:03:40 dev kernel: hid-generic 0003:1209:53C1.0002: hiddev0,hidraw1: USB HID v1.11 Device [SatoshiLabs TREZOR] on usb-0000:00:06.0-3/input1
Dec 17 11:03:40 dev mtp-probe[4069]: checking bus 1, device 4: "/sys/devices/pci0000:00/0000:00:06.0/usb1/1-3"
Dec 17 11:03:40 dev mtp-probe[4080]: checking bus 1, device 4: "/sys/devices/pci0000:00/0000:00:06.0/usb1/1-3"
Dec 17 11:03:40 dev kernel: usb 1-3: reset full-speed USB device number 4 using ohci-pci
Dec 17 11:03:47 dev kernel: usb 1-3: reset full-speed USB device number 4 using ohci-pci
Dec 17 11:03:49 dev kernel: usb 1-3: reset full-speed USB device number 4 using ohci-pci
Dec 17 11:03:50 dev kernel: usb 1-3: reset full-speed USB device number 4 using ohci-pci
Dec 17 11:03:52 dev kernel: usb 1-3: reset full-speed USB device number 4 using ohci-pci
...
Dec 17 11:45:05 dev kernel: usb 1-3: reset full-speed USB device number 4 using ohci-pci
Dec 17 11:45:06 dev kernel: usb 1-3: reset full-speed USB device number 4 using ohci-pci
Dec 17 11:45:07 dev kernel: usb 1-3: reset full-speed USB device number 4 using ohci-pci
Dec 17 11:45:08 dev kernel: usb 1-3: reset full-speed USB device number 4 using ohci-pci
Dec 17 11:45:09 dev kernel: usb 1-3: usbfs: process 8090 (trezord) did not claim interface 0 before use
matejcik commented 2 years ago

Resetting is expected, bridge needs to reset the device at acquire time. I don't exactly remember why :) but some things go bad when we omit the reset.

It seems that the bridge doesn't handle multiple applications well when they are aggressively racing to acquire a handle to the device. If two applications are aggressively calling the bridge they will return either session not found or wrong previous session. Neither application will be able to use the device if one of them is retrying instantly.

This seems like expected behavior? Bridge is not exactly trying to multiplex access to device -- that doesn't work well with the device itself, we'd have to put Initialize management into the Bridge itself and generally be much more strict about things. The racey apps are "misbehaving" -- when you encounter "wrong previous session" or "session not found", you're supposed to throw a modal like "the device is being used elsewhere. click to use in this app.", and /listen for the other app releasing.