paypal / gatt

Gatt is a Go package for building Bluetooth Low Energy peripherals
BSD 3-Clause "New" or "Revised" License
1.13k stars 284 forks source link

Increasing buffer size on responses channel #33

Closed demon-xxi closed 9 years ago

demon-xxi commented 9 years ago

Because of "Dequeue request loop" returning message into the channel (req.rspc <- r) this may deadlock if another response has been already pushed by other routine to the channel.
It does locks consistently with devices that send a lot of notifications. Adding extra buffer in channel reserves space for that extra command to be returned.

roylee17 commented 9 years ago

Per spec, within a connection the requests and responses are strictly serialized. A request/command is only sent if we got the response from the previous request or it was a command, which doesn't require a response. So the unbuffered channel and the loop here is for serialization purpose. Otherwise the loop can be eliminated, and each call to the sendReq() could wait the response within. (And this would leave the serialization to upper layer though.)

If the upper layer queues a lot of request, for example, we would expect the stack to send them with the following sequence: req1, rsp1, req2, rsp2, req3, rsp3, req4, rsp4. Increase the buffer might resulting something like: req1, req2, rsp1, req3, rsp2, rsp3, req4, rsp4.

Do you have a sample sequence that could cause deadlock? Screenshot of sniffer capture, or adding some log message when the deadlock happens.

Current code seems only deadlock if we missed an expected response. If that's the case, we should gracefully terminate the connection with time out. (An important TODO...)

demon-xxi commented 9 years ago

I share your concern regarding sequence of req/resp. I do a simple scenario of discovering sensortag and writing two characteristics to enable sensor and then subscribe to notifications. Once notifications start coming locks in both loops:

The one on peripherial_linux.go on line 374 and hci.go on line 382

but the root cause for me happens on line 351

it locks there as most likely req.rspc already has data in channel which is not being pulled by anything. Consequently loop stops and other loops that rely on channels being pulled lock as well.

I have saved log and dump using hcidump tool. Hope this will be useful.

roylee17 commented 9 years ago

The "log" seems only have the device initialization part, is it truncated?

Are they "notifications" or "indications"? In case of indications, the early return at https://github.com/paypal/gatt/blob/master/peripheral_linux.go#L373 will be the bug as it prevent the acknowledge from being sent (at the bottom of the loop).

p.s. I might not be able to get my environment setup to debug until next week. So I need to trouble you to peak and poke around :-)

demon-xxi commented 9 years ago

Roy, thank you for the comment it helped to investigate issue further. The root cause that we found it unrelated to my proposed change so I'm closing this issue. But please look at PR #34 form my colleague with the right fix.