zsmartsystems / com.zsmartsystems.zigbee

ZigBee Cluster Library Java framework supporting multiple dongles
Eclipse Public License 1.0
143 stars 88 forks source link

Ember: Race conditions assigning sequence numbers in EzspFrameRequest #791

Closed sesuncedu closed 4 years ago

sesuncedu commented 5 years ago

EZSP frame sequence numbers are integers in the range [1,254].

EzspFrameRequest stores the value of the next sequence number to assign in an AtomicLong.

https://github.com/zsmartsystems/com.zsmartsystems.zigbee/blob/d7398dd5edcfb5f78a3e367a82146ea7d73d95a2/com.zsmartsystems.zigbee.dongle.ember/src/main/java/com/zsmartsystems/zigbee/dongle/ember/ezsp/EzspFrameRequest.java#L43

Sequence numbers are assigned in the constructor

https://github.com/zsmartsystems/com.zsmartsystems.zigbee/blob/d7398dd5edcfb5f78a3e367a82146ea7d73d95a2/com.zsmartsystems.zigbee.dongle.ember/src/main/java/com/zsmartsystems/zigbee/dongle/ember/ezsp/EzspFrameRequest.java#L48-L53

This code is subject can return invalid values due to the non-atomic manipulation of the sequence when the value overflows. For example, if a thread is assigned sequence number 254, but other threads increment the value before the first thread has had a chance to reset it to 1, the those threads can be assigned sequence numbers of 255 or higher.

The fix is to get and update the sequence in a single atomic operation. e.g.

    private final static AtomicInteger sequence = new AtomicInteger(1);

    /**
     * Constructor used to create an outgoing frame
     */
    protected EzspFrameRequest() {
        sequenceNumber = sequence.getAndUpdate(n -> n <254 ? n+1 : 1);
    }

I need to stash some stuff so I can create a PR with just this change.

cdjackson commented 5 years ago

Thanks. As commented on the PR, I’d suggest to just synchronise the existing code. We can probably change to using an AtomicInteger - I think in the past I needed to keep track of the total number of frames sent which is why it is done the way it is, but that’s not used now, but we can’t use streaming.

On 11 Oct 2019, at 00:19, Simon Spero notifications@github.com wrote:

EZSP frame sequence numbers are integers in the range [1,254].

EzspFrameRequest stores the value of the next sequence number to assign in an AtomicLong.

https://github.com/zsmartsystems/com.zsmartsystems.zigbee/blob/d7398dd5edcfb5f78a3e367a82146ea7d73d95a2/com.zsmartsystems.zigbee.dongle.ember/src/main/java/com/zsmartsystems/zigbee/dongle/ember/ezsp/EzspFrameRequest.java#L43 https://github.com/zsmartsystems/com.zsmartsystems.zigbee/blob/d7398dd5edcfb5f78a3e367a82146ea7d73d95a2/com.zsmartsystems.zigbee.dongle.ember/src/main/java/com/zsmartsystems/zigbee/dongle/ember/ezsp/EzspFrameRequest.java#L43 Sequence numbers are assigned in the constructor

https://github.com/zsmartsystems/com.zsmartsystems.zigbee/blob/d7398dd5edcfb5f78a3e367a82146ea7d73d95a2/com.zsmartsystems.zigbee.dongle.ember/src/main/java/com/zsmartsystems/zigbee/dongle/ember/ezsp/EzspFrameRequest.java#L48-L53 https://github.com/zsmartsystems/com.zsmartsystems.zigbee/blob/d7398dd5edcfb5f78a3e367a82146ea7d73d95a2/com.zsmartsystems.zigbee.dongle.ember/src/main/java/com/zsmartsystems/zigbee/dongle/ember/ezsp/EzspFrameRequest.java#L48-L53 This code is subject can return invalid values due to the non-atomic manipulation of the sequence when the value overflows. For example, if a thread is assigned sequence number 254, but other threads increment the value before the first thread has had a chance to reset it to 1, the those threads can be assigned sequence numbers of 255 or higher.

The fix is to get and update the sequence in a single atomic operation. e.g.

private final static AtomicInteger sequence = new AtomicInteger(1);

/**
 * Constructor used to create an outgoing frame
 */
protected EzspFrameRequest() {
    sequenceNumber = sequence.getAndUpdate(n -> n <254 ? n+1 : 1);
}

I need to stash some stuff so I can create a PR with just this change.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zsmartsystems/com.zsmartsystems.zigbee/issues/791?email_source=notifications&email_token=AAH6IQ3YS6PCW2E6SVSY4QLQN62AFA5CNFSM4I7TDNHKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HRCGHFA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH6IQYVCEQSHXE2K5VA52TQN62AFANCNFSM4I7TDNHA.

sesuncedu commented 5 years ago

Using getAndUpdate is faster and less typing, so what's not to like :)

[Being able to use streaming again would be nice if setSourceRoute is used.]

cdjackson commented 5 years ago

Unfortunately streaming is not possible due to Android limitations.

On 11 Oct 2019, at 17:58, Simon Spero notifications@github.com wrote:

Using getAndUpdate is faster and less typing, so what's not to like :)

[Being able to use streaming again would be nice if setSourceRoute is used.]

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zsmartsystems/com.zsmartsystems.zigbee/issues/791?email_source=notifications&email_token=AAH6IQ4MHB6PZGF5YTAB4JLQOCWBXA5CNFSM4I7TDNHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBATDOY#issuecomment-541143483, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH6IQ764J6KVGOLSXAGPYTQOCWBXANCNFSM4I7TDNHA.

sesuncedu commented 5 years ago

I'm confused about android. Is this an issue with android usbserial drivers? AtomicInteger::getAndUpdate was added to Android in API 24 (Nougat).

sesuncedu commented 5 years ago

https://github.com/felHR85/UsbSerial is aysnc and supports flow control for most of the relevant devices

cdjackson commented 5 years ago

I need to support older Android, and the streaming API is not available. This can't currently be changed - sorry.

I'm not sure that I understand the reference to the serial driver for Android?

sesuncedu commented 5 years ago

I was confused by your reference to streaming. getAndUpdate doesn't use streams. It does take an argument that is an instance of IntUnaryOperator, which in the code above is a lambda. Lambda is available for old android because it is desugared. getAndUpdate isn't until 24.

I thought you were talking about streaming ezsp+ash from android; some older usbserial code for Android didn't do flow control; the library I linked to does (for the most popular usb-uart bridges).

If you have specific android api constraints it might be setting up animal sniffer?

cdjackson commented 5 years ago

Yes, I know - I think I said that somewhere earlier that I misread the code (don’t try and review code on your phone ;) ).

I’ve not heard of “animal sniffer” before (I’’m not really an Android user), but yes, this does seem a good idea to catch issues - thanks.

On 12 Oct 2019, at 04:01, Simon Spero notifications@github.com wrote:

I was confused by your reference to streaming. getAndUpdate doesn't use streams. It does take an argument that is an instance of IntUnaryOperator, which in the code above is a lambda. Lambda is available for old android because it is desugared. getAndUpdate isn't until 24.

I thought you were talking about streaming ezsp+ash from android; some older usbserial code for Android didn't do flow control; the library I linked to does (for the most popular usb-uart bridges).

If you have specific android api constraints it might be setting up animal sniffer?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zsmartsystems/com.zsmartsystems.zigbee/issues/791?email_source=notifications&email_token=AAH6IQZTLCGEP4L236FJHY3QOE4ZZA5CNFSM4I7TDNHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBBT3OY#issuecomment-541277627, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH6IQYZ4CUZ4VPAKDTMV63QOE4ZZANCNFSM4I7TDNHA.

sesuncedu commented 5 years ago

Animalsniffer is maven ; it checks for api usage. Obviously it's not a an android thing because it isn't gradle. Hmm- say... Also there would be kotlin... Hmm-say 🤔

cdjackson commented 5 years ago

@sesuncedu is this still relevant since you have closed the PR?

cdjackson commented 5 years ago

@sesuncedu do we now close this issue or is it still something you are looking at? You've closed the PR, so I'm assuming it's now not an issue?

cdjackson commented 5 years ago

@sesuncedu please can you comment on the status of this? You raised the issue, and opened a PR that you have now closed - can you update us on what is happening and why you closed the PR please?

cdjackson commented 5 years ago

@sesuncedu where are we with this and the PR you closed?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.