xinbc / jdiameter

Automatically exported from code.google.com/p/jdiameter
0 stars 0 forks source link

Diameter stack peer statistics not thread safe #37

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Apologies only found this error now.

With the new multi-threaded FSM code the creating and removing of peer 
statistics is not thread safe and causes a race condition where a request can 
fail.

The attached patch is very simple and only affects 
org.jdiameter.common.impl.controller.AbstractPeer.java

It puts a concurrent Re-entrant lock around adding and removing peer statistics.

PS Also have not managed to test latest jdiameter with SCTP.  Unfortunately our 
only SCTP Diameter client (MME) is our vendor equipment and only currently 
available in our live networks!  As soon as we get a laboratory environment 
(possibly only in next few months) I will test all the latest and migrate 
everything over to the latest ASAP.

Original issue reported on code.google.com by richard....@smilecoms.com on 6 Feb 2013 at 11:59

Attachments:

GoogleCodeExporter commented 9 years ago
Hi

I found a new issue with this.

If you get in a situation where the connection flaps (repeatedly reconnects) 
you can get into a situation where the statistic is null even though the peer 
is created - processing messages then fails as statistic is NULL.

I'm not sure how this happens as a connection flap is difficult to replicate 
but I suggest to avoid it (as it is a disaster if requests can not be 
processed) to add the following to the 
org.jdiameter.common.impl.controller.AbstractPeer method: createPeerStatistics()

In the finally block at the end:

} finally {
+            if(this.statistic == null){
+       logger.warn("Failed to create statistic, making dummy statistic and setting 
to disabled");
+       this.statistic = statisticFactory.newStatistic("local", 
IStatistic.Groups.Peer);
+       this.statistic.enable(false);
+       }
        statisticsLock.unlock();

        }

This will create a disabled statistic if anything does go wrong during creation 
and make sure we don't have a disaster where requests can't be processed.

Original comment by richard....@smilecoms.com on 19 Dec 2013 at 2:27

GoogleCodeExporter commented 9 years ago
Richard,

Sorry for only merging this now. Do you recall why would you check twice for 
null (both on create and remove), first before lock and then after lock.

I'll merge just with the check inside the lock, as it seems to suffice, but let 
me know if I missed anything.

Original comment by brainslog on 28 Oct 2014 at 1:28

GoogleCodeExporter commented 9 years ago
This issue was fixed by revision e90a9c62341a.

Original comment by brainslog on 28 Oct 2014 at 1:35