twitter-archive / cloudhopper-smpp

Efficient, scalable, and flexible Java implementation of the Short Messaging Peer to Peer Protocol (SMPP)
Other
382 stars 356 forks source link

Threads get stuck at DefaultSmppSession.sendRequestPd:530 #114

Open mthyman opened 8 years ago

mthyman commented 8 years ago

We have noticed that when the network between client and server or threads get stuck on line 530 in DefaultSmppSession. The code is:

// write the pdu out & wait timeout amount of time
ChannelFuture channelFuture = this.channel.write(buffer).await();

I think that the fix should be something like this:

// write the pdu out & wait timeout amount of time
ChannelFuture channelFuture = this.channel.write(buffer).await(timeoutMillis);

This is on version 5.0.7 and 5.0.8

jjlauer commented 8 years ago

You probably are onto something. Gladly will accept a PR that makes the change.

mthyman commented 8 years ago

Pull request added https://github.com/twitter/cloudhopper-smpp/pull/115

olegagafonov commented 8 years ago

Hi! Branch 'netty4' peace of code: // write the pdu out & wait timeout amount of time ChannelFuture channelFuture = this.channel.writeAndFlush(buffer); if (configuration.getWriteTimeout() > 0){ channelFuture.await(configuration.getWriteTimeout()); } else { channelFuture.await(); } Method writeAndFlush won't throw exception if channel is closed by other side. So, if 'writeTimeout' is default and session listener doesn't handle 'fireChannelClosed' we'll get the same lock. Please, let me know if @mthyman PR affects branch netty4?

mthyman commented 8 years ago

@olegagafonov I haven't tried the netty4 branch myself so I haven't seen the problem there, but looking at your code snippet setting a positive write timeout in your config should avoid the problem if it's there.

One a side note, we've been running a special build with my PR in production for almost two months now without seeing any threads get stuck so far.

olegagafonov commented 8 years ago

@mthyman You're right! Possitive timeout is a solution. But default timeout is a delayed dead-lock. Your PR and my code snippet is the same code in differrent branches. But I don't know both branches lifecycle and merge strategy. @jjlauer what do you think?