zhoujian2030 / jdiameter

Automatically exported from code.google.com/p/jdiameter
0 stars 0 forks source link

Race condition in TCP Network Guard can leave socket still listening for incoming connection after destroy() #74

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Check out source for either 1.6.0.FINAL or 1.7.0-SNAPSHOT
2. Build source with mvn clean install
3. Build testsuite with mvn clean install

What is the expected output? What do you see instead?
All test suite tests to pass. Instead of this I observe occasional non 
deterministic failures.

What version of the product are you using? On what operating system?
1.6.0.FINAL or 1.7.0-SNAPSHOT

Please provide any additional information below.

The test suite tests do not always pass successfully every time. I investigated 
why this was and in one case have traced it to a bug in the TCP version of the 
Network guard 
(core/jdiameter/impl/src/main/java/org/jdiameter/server/impl/io/tcp/NetworkGuard
.java).

In this class the constructor creates a GuardTask object to open the TCP server 
socket to listen for connections from other diameter peers. The GuardTask 
implements the Runnable interface and is run in a separate thread (it can be 
executed with a configurable delay). The problem is that the GuardTask 
cleanTask() member (which is called by the NetworkGuard destroy() member) does 
not correctly check that the Thread that opens the server socket has completed. 
It simply checks if the server socket variable is not null therefore there is a 
race condition between the thread creating the server socket and the thread 
destroying the NetworkGuard. This can result in the socket being opened after 
the NetworkGuard task destroy() member has been called.

In the teas suite tests this can manifest itself with a test not destroying its 
server socket which in turn means the next test fails because it cannot it self 
open the server socket (address already in use exception). 

I have attached a modified version of the NetworkGuard class where I have tried 
to fix this issue. In this version I maintain a handle to the Guard Task:

  private class GuardTask implements Runnable {
    private Thread thread;
    private Selector selector;
    private ServerSocket serverSocket;
    ScheduledFuture<?> handle = null;         // HANDLE TO GUARD TASK

    private final ScheduledExecutorService binder = Executors.newSingleThreadScheduledExecutor();

    public GuardTask(final InetSocketAddress addr) throws IOException {

      if(bindDelay > 0) {
        logger.info("Socket binding will be delayed by {}ms...", bindDelay);
      }

      Runnable task = new Runnable() {
        public void run() {
          try {
            logger.debug("Binding {} after delaying {}ms...", addr, bindDelay);
            final ServerSocketChannel ssc = ServerSocketChannel.open();
            ssc.configureBlocking(false);
            serverSocket = ssc.socket();
            serverSocket.bind(addr);
            selector = Selector.open();
            ssc.register(selector, SelectionKey.OP_ACCEPT, addr);
            logger.info("Open server socket {} ", serverSocket);
          }
          catch (IOException e)
          {
            logger.error("Error opening server socket", e);
            throw new RuntimeException(e);
          }
        }
      };

      // KEEP HANDLE TO GUARD TASK THAT OPENS SERVER SOCKET
      //---------------------------------------------------
      handle = binder.schedule(task, bindDelay, TimeUnit.MILLISECONDS);
    }

Then later in the clean task I use the handle to check the server socket has 
been setup:

    public void cleanTask() {

      try {
        if (thread != null) {
          thread.join(2000);
          if (thread.isAlive()) {
            // FIXME: remove ASAP
            thread.interrupt();
          }
          thread = null;
        }
      }
      catch (InterruptedException e) {
        logger.debug("Can not stop thread", e);
      }

      // USE SERVER SOCKET CREATION HANDLE TO MAKE SURE THE
      // SOCKET HAS BEEN OPENED. THE SERVER SOCKET MAY BE IN THE
      // PROCESS OF BEING CREATED...
      if (selector == null)
      {
        logger.debug("Selector is null... checking that it is not in the process of being started");
        if (handle != null)
        {
             try
             {
                  handle.get(10, TimeUnit.SECONDS);^M
             }
             catch (Exception e)
             {
                  logger.error("Exception thrown waiting for GuardTask to complete", e);
             }
        }
        else
        {
            logger.debug("Handle is null");^M
        }
      }

As you can see in this version the code waits for 10 seconds to make sure the 
server socket has been opened. This is okay for when the GuardTask has been 
start without any delay which is the case for all the tests in the test suite. 
I have not considered the case where the server socket is opened after a 
configurable delay. 

Note that in the attached modified file I also added a number of logger trace 
statements whilst I was debugging this issue.

Original issue reported on code.google.com by dpalmer...@gmail.com on 24 Mar 2015 at 10:42

Attachments: