What steps will reproduce the problem?
1. Check out source for either 1.6.0.FINAL or 1.7.0-SNAPSHOT
2. Build source with mvn clean install
3. Build testsuite with mvn clean install
What is the expected output? What do you see instead?
All test suite tests to pass. Instead of this I observe occasional non
deterministic failures.
What version of the product are you using? On what operating system?
1.6.0.FINAL or 1.7.0-SNAPSHOT
Please provide any additional information below.
The test suite tests do not always pass successfully every time. I investigated
why this was and in one case have traced it to a bug in the TCP version of the
Network guard
(core/jdiameter/impl/src/main/java/org/jdiameter/server/impl/io/tcp/NetworkGuard
.java).
In this class the constructor creates a GuardTask object to open the TCP server
socket to listen for connections from other diameter peers. The GuardTask
implements the Runnable interface and is run in a separate thread (it can be
executed with a configurable delay). The problem is that the GuardTask
cleanTask() member (which is called by the NetworkGuard destroy() member) does
not correctly check that the Thread that opens the server socket has completed.
It simply checks if the server socket variable is not null therefore there is a
race condition between the thread creating the server socket and the thread
destroying the NetworkGuard. This can result in the socket being opened after
the NetworkGuard task destroy() member has been called.
In the teas suite tests this can manifest itself with a test not destroying its
server socket which in turn means the next test fails because it cannot it self
open the server socket (address already in use exception).
I have attached a modified version of the NetworkGuard class where I have tried
to fix this issue. In this version I maintain a handle to the Guard Task:
private class GuardTask implements Runnable {
private Thread thread;
private Selector selector;
private ServerSocket serverSocket;
ScheduledFuture<?> handle = null; // HANDLE TO GUARD TASK
private final ScheduledExecutorService binder = Executors.newSingleThreadScheduledExecutor();
public GuardTask(final InetSocketAddress addr) throws IOException {
if(bindDelay > 0) {
logger.info("Socket binding will be delayed by {}ms...", bindDelay);
}
Runnable task = new Runnable() {
public void run() {
try {
logger.debug("Binding {} after delaying {}ms...", addr, bindDelay);
final ServerSocketChannel ssc = ServerSocketChannel.open();
ssc.configureBlocking(false);
serverSocket = ssc.socket();
serverSocket.bind(addr);
selector = Selector.open();
ssc.register(selector, SelectionKey.OP_ACCEPT, addr);
logger.info("Open server socket {} ", serverSocket);
}
catch (IOException e)
{
logger.error("Error opening server socket", e);
throw new RuntimeException(e);
}
}
};
// KEEP HANDLE TO GUARD TASK THAT OPENS SERVER SOCKET
//---------------------------------------------------
handle = binder.schedule(task, bindDelay, TimeUnit.MILLISECONDS);
}
Then later in the clean task I use the handle to check the server socket has
been setup:
public void cleanTask() {
try {
if (thread != null) {
thread.join(2000);
if (thread.isAlive()) {
// FIXME: remove ASAP
thread.interrupt();
}
thread = null;
}
}
catch (InterruptedException e) {
logger.debug("Can not stop thread", e);
}
// USE SERVER SOCKET CREATION HANDLE TO MAKE SURE THE
// SOCKET HAS BEEN OPENED. THE SERVER SOCKET MAY BE IN THE
// PROCESS OF BEING CREATED...
if (selector == null)
{
logger.debug("Selector is null... checking that it is not in the process of being started");
if (handle != null)
{
try
{
handle.get(10, TimeUnit.SECONDS);^M
}
catch (Exception e)
{
logger.error("Exception thrown waiting for GuardTask to complete", e);
}
}
else
{
logger.debug("Handle is null");^M
}
}
As you can see in this version the code waits for 10 seconds to make sure the
server socket has been opened. This is okay for when the GuardTask has been
start without any delay which is the case for all the tests in the test suite.
I have not considered the case where the server socket is opened after a
configurable delay.
Note that in the attached modified file I also added a number of logger trace
statements whilst I was debugging this issue.
Original issue reported on code.google.com by dpalmer...@gmail.com on 24 Mar 2015 at 10:42
Original issue reported on code.google.com by
dpalmer...@gmail.com
on 24 Mar 2015 at 10:42Attachments: