splitio / java-client

Java SDK client for Split Software
https://split.io
Other
25 stars 18 forks source link

Deadlock issue when tracking impressions #473

Closed ryaneorth closed 6 months ago

ryaneorth commented 6 months ago

We have a high throughput Spring-based web application that frequently encounters a deadlock scenario in the Split SDK, ultimately requiring restarting the web application to resolve.

Specifically, all threads get stuck with the following stack trace:

deployment.redrock.ear//io.split.client.impressions.UniqueKeysTrackerImp.track(UniqueKeysTrackerImp.java:59)
deployment.redrock.ear//io.split.client.impressions.strategy.ProcessImpressionNone.process(ProcessImpressionNone.java:28)
deployment.redrock.ear//io.split.client.impressions.ImpressionsManagerImpl.track(ImpressionsManagerImpl.java:116)
deployment.redrock.ear//io.split.client.SplitClientImpl.recordStats(SplitClientImpl.java:342)
deployment.redrock.ear//io.split.client.SplitClientImpl.getTreatmentWithConfigInternal(SplitClientImpl.java:258)
deployment.redrock.ear//io.split.client.SplitClientImpl.getTreatmentWithConfig(SplitClientImpl.java:99)

We encountered this a few months before and decided to switch to use None as our impressionsMode assuming that would solve the issue (we don't care about impressions anyway). However, digging into the code it appears the ProcessImpressionsNone.java still uses the UniqueKeysTracker which is backed by a ConcurrentHashMap, ultimately deadlocking on the call to put.

To solve this issue, could we this issue be resolved in one of the two following ways:

  1. Provide us with a true "no-op" impressions tracker that disables handling impressions entirely (e.g. the process method is empty)
  2. Fix the deadlock issue within the UniqueKeysTracker - either by not using a ConcurrentHashMap or wrap the put call on the ConcurrentHashMap with some kind of timeout?

As mentioned, impressions are not incredibly important to us and the application not deadlocking is our biggest goal.

For what its worth, we are using version 4.7.0 of the client and looking at the changes since then I do not see any changes that would lead me to believe this issue has been addressed in the latest version (4.11.0).

ryaneorth commented 6 months ago

I took a closer look at this and since we are running version 4.7.0 of the client, the place where the threads are stuck:

deployment.redrock.ear//io.split.client.impressions.UniqueKeysTrackerImp.track(UniqueKeysTrackerImp.java:59)

actually aligns with the first line of the UniqueKeysTrackerImp.track method which tells me other threads are stuck trying to get into this method because it is synchronized.

I'm not sure what could be blocking in this method, but does it really need to be synchronized?

In the meantime, I've opened this PR to completely disable processing impressions (basically implementing option 1 in my original post). Please review at your earliest convenience.

https://github.com/splitio/java-client/pull/474

mmelograno commented 6 months ago

Hi @ryaneorth ,

Sorry for the inconvenience on experiencing deadlock issues. We have released JAVA 4.11.1 which is a revamp of the None process logic.

I will proceed on closing this issue, feel free to reopen if you are experiencing some issue again.

Thanks, Matias