Unsynchronized multi-threaded HashMap access

graben commented 6 years ago

@grgrzybek : Plz have a look because this looks like a valid bug if heavy used concurrently.

grgrzybek commented 6 years ago

Thanks @graben I don't think it's that big issue however.

you've synchronized access to resources and recoverables in 3 TM wrappers, but these are called only during registration of connection manager for given XA resource - but this is called usually by blueprint, scr or when you simply register XADataSource/XAConnectionFactory
you've synchronized access to transactions in AbstractTransactionManagerWrapper but this map is keyed by javax.transaction.Transaction obtained directly from underlying TM, which should take care of synchronization - each thread gets different transaction anyway.

@gnodet what do you think - maybe I missed something?

vladimirfx commented 6 years ago

This is not VIRTUAL or THEORETICAL issues. Actualy, we have real production thread stuck on transactions Map (see original pull request for stack). Reproducable 100%. Transactions map changes without any synchronization. VM with only 2 cores and very commodity load.

vladimirfx commented 6 years ago

@grgrzybek resources and recoverables accessed from many different threads. This leads to strange hardly reproducible bugs. HOWEVER, synchronization for this collections not not have any measurable performance impact because is used only on resource registration/unregistration.

laeubi commented 6 years ago

Wouldn't it be better to add explicit synchronization instead of rely on the synchronized map? I found synchronized map less helpfull because it does only address the problem of hashmap corruption but does not solve the problem with data consistency. @vladimirfx If you say it is 100% reproducible can you tell the codepath from the thread-dump that cause the problem? @grgrzybek I think transactions are a critical part of the system and we should address any issue (small or big ;-) )

vladimirfx commented 6 years ago

"schedulerSyncChanges-1" Id=162 in RUNNABLE at java.util.WeakHashMap.transfer(WeakHashMap.java:528) at java.util.WeakHashMap.resize(WeakHashMap.java:493) at java.util.WeakHashMap.put(WeakHashMap.java:466) at java.util.Map.computeIfAbsent(Map.java:958) at org.ops4j.pax.transx.tm.impl.AbstractTransactionManagerWrapper.getTransaction(AbstractTransactionManagerWrapper.java:49) at org.ops4j.pax.transx.connector.impl.GenericConnectionManager.getMci(GenericConnectionManager.java:186) at org.ops4j.pax.transx.connector.impl.GenericConnectionManager.allocateConnection(GenericConnectionManager.java:171) at org.ops4j.pax.transx.connector.impl.GenericConnectionManager.allocateConnection(GenericConnectionManager.java:167) at org.ops4j.pax.transx.jdbc.impl.TransxDataSource.getConnection(TransxDataSource.java:68) at org.ops4j.pax.transx.jdbc.impl.TransxDataSource.getConnection(TransxDataSource.java:59) at sun.reflect.GeneratedMethodAccessor328.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498)

vladimirfx commented 6 years ago

For transactions map syncronization on some particular mutex not needed because transaction is thread bound. There is no real concurrency - just HashMap corruption. But for resource registration - I agrees, synchronization of reg/unreg methods on some mutex is reasonable. I' can update my pull request.

graben commented 6 years ago

Another question never asked is what are the maps in the wrappers for? It seems they are not used anywhere in the code or do I miss something?

Maybe we should remove them (resource/recover maps) at all and only synchronize getTransaction() method in AbstractTransactionManagerWrapper.

grgrzybek commented 6 years ago

Thanks for feedback. I did some detailed race/lock analysis in different projects (Karaf, Aries) before, but here I didn't yet have the chance. I believe this may be the issue and I can merge this PR without problems.

If you have more issues, I'll be happy to review them - however I'm back from holidays in ~2 weeks. Today is my last day ;)

graben commented 6 years ago

Well, it would be nice to have feedback from @gnodet about my last statement since I actually see no usage/need for most of the Maps. Let us talk about that in two weeks after your holiday. Maybe we can optimize that before releasing 0.4. :)

vladimirfx commented 6 years ago

May be prepare release 0.4.0 and optimize in 0.5? JB can include this fix and #5 in Karaf 4.2.1.

grgrzybek commented 6 years ago

@gnodet @jbonofre may I ask you to review this as well? Thanks!

ops4j / org.ops4j.pax.transx

Unsynchronized multi-threaded HashMap access #10