zsmartsystems / com.zsmartsystems.zigbee

ZigBee Cluster Library Java framework supporting multiple dongles
Eclipse Public License 1.0
142 stars 88 forks source link

java.lang.IllegalStateException in AshReceiveParserThread #1318

Closed TomTravaglino closed 2 years ago

TomTravaglino commented 2 years ago

Occasionally we observe such an Exception

2022-04-01T16:44:01.050+0200    5 ERROR [AshReceiveParserThread,com.zsmartsystems.zigbee.dongle.ember.internal.ash.AshFrameHandler] AshReceiveParserThread Exception: 
java.lang.IllegalStateException: Queue full
    at java.util.AbstractQueue.add(AbstractQueue.java:98)
    at java.util.concurrent.ArrayBlockingQueue.add(ArrayBlockingQueue.java:312)
    at com.zsmartsystems.zigbee.dongle.ember.internal.ash.AshFrameHandler.handleIncomingFrame(AshFrameHandler.java:239)
    at com.zsmartsystems.zigbee.dongle.ember.internal.ash.AshFrameHandler.access$700(AshFrameHandler.java:59)
    at com.zsmartsystems.zigbee.dongle.ember.internal.ash.AshFrameHandler$AshReceiveParserThread.run(AshFrameHandler.java:314)

It occurs where the frame is put into an ArrayBlockingQueue which has a size of (only) 10. Since we observed this on slow machines it is possible that AshReceiveProcessorThread can't process the entries fast enough and so the queue runs out of capacity. Do you think it makes sense to increase the size? To be honest, I don't know what size would be good.

TomTravaglino commented 2 years ago

Hello @cdjackson . Did you see this :wink:? Would it make sense to change the size of the ArrayBlockingQueue? Or do you think it might have side effects?

cdjackson commented 2 years ago

Sorry @TomTravaglino I thought I had replied to this - maybe it was by email and it got lost - in any case, it's not here :(

My thoughts were that yes, we could increase it, but depending on how long packets are staying in the queue, it might be just as well to silently drop the frames and let the consequences be picked up through other mechanisms. Eg if this is an incoming frame, and it's in the queue for a second or so, then it's likely that the remote will resend - so it may have the effect of filling the queue even more.

I don't have a huge problem with changing the queue size in general, but it will likely not solve the problem unless it is made really large. It will of course reduce the possibility - if we make it 20 for example, it will go from happening "occasionally", to "very occasionally 😉, but it will likely still happen.

Is it normally just one exception that occurs, or does it occur in a burst (indicating that we'd need to increase to a much larger number?

Does dropping a frame tend to cause a problem, or does the system resolve this with a higher layer action (retries for example)?

I'd be happy to increase the queue to (say) 15, or 20. I assume (??) that this doesn't cause the ASH receive thread to exit as it requires 10 exceptions without a valid frame, and it looks like this counter is reset before the queue add, so it will always be 0?

TomTravaglino commented 2 years ago

Hi @cdjackson, thanks for the reply. Let me try to answer your questions. Unfortunately, I have not experienced the problem (if it really is one) myself, so I have to rely on the information I have been provided.

Is it normally just one exception that occurs, or does it occur in a burst ... ?

In the logs I saw just single exceptions. And as I said, it occurs only occasionally.

Does dropping a frame tend to cause a problem ... ?

The reporters mentioned that their systems were still running stable. The issue was reported because the exception was seen when inspecting the logs and it was logged on error.

So if it's okay to increase the queue to 20, I'll create an MR for that.

cdjackson commented 2 years ago

Thanks @TomTravaglino - yes, I'm fine with that.