Avoid locking around get operators that don't return a value

paulaustin-automutatio commented 6 days ago

I'm submitting a ...

[*] bug report
[ ] feature request

Describe the issue

I have a process that is hung in the QueryExecutorBase for 12+ hours.

In the org.postgresql.core.QueryExecutorBase it is using reentrant locks to protect fields and operations. However it is also locking simple get operations that are atomic by definition. Is there some reason that this is needed? If a caller (e.g. borrowObject) needs to make sure it is using the same values for those fields then it should do the lock, not the method to get the value.

For example

  public boolean getStandardConformingStrings() {
    try (ResourceLock ignore = lock.obtain()) {
      return standardConformingStrings;
    }
  }

Should be

  public boolean getStandardConformingStrings() {
      return standardConformingStrings;
  }

This applies to all of the get methods that just return values.

Driver Version?

4.7.2

Java Version?

21.0.4

OS Version?

N/A

PostgreSQL Version?

16

To Reproduce Steps to reproduce the behaviour:

Unable to directly reproduce as it is a locking race condition.

Expected behaviour

Code to not block

Logs

Here is a stack trace that is blocked. There are no other threads currently using an postgresql classes.

java.base/java.lang.VirtualThread.park(VirtualThread.java:596)
java.base/java.lang.System$2.parkVirtualThread(System.java:2643)
java.base/jdk.internal.misc.VirtualThreads.park(VirtualThreads.java:54)
java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:219)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:990)
java.base/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
java.base/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
org.postgresql.jdbc.ResourceLock.obtain(ResourceLock.java:28)
org.postgresql.core.QueryExecutorBase.getStandardConformingStrings(QueryExecutorBase.java:293)
org.postgresql.core.CachedQueryCreateAction.create(CachedQueryCreateAction.java:43)
org.postgresql.core.CachedQueryCreateAction.create(CachedQueryCreateAction.java:19)
org.postgresql.util.LruCache.borrow(LruCache.java:130)
org.postgresql.core.QueryExecutorBase.borrowQuery(QueryExecutorBase.java:326)
org.postgresql.jdbc.PgConnection.borrowQuery(PgConnection.java:222)
org.postgresql.jdbc.PgPreparedStatement.<init>(PgPreparedStatement.java:91)
org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1411)
org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1858)
org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:534)

davecramer commented 6 days ago

Fair enough, but I'm curious why this would be blocked ?

paulaustin-automutatio commented 6 days ago

Dave,

Unfortunately I couldn't find an answer to that question. There didn't seem to be anything else that was using the postgresql classes at the time . I have code which can dump all the stack traces for all threads, including virtual threads that I create (normally the JVM doesn't list virtual threads via the MX Bean).

I looked through that class to see where you were using the lock and as far as I can tell it was doing so using a try with resources block so it should be releasing the locks correctly.

Paul

davecramer commented 6 days ago

seems bizarre, we didn't think the lock would really effect anything like that. I'm OK removing them but somethings amiss here.

svendiedrichsen commented 6 days ago

Is a single connection used by multiple threads?

paulaustin-automutatio commented 6 days ago

At any one given time the connection should only be used by a single thread. However that connection is in a connection pool, and over time it might be used by different threads. Which shouldn't be using it in 2 threads concurrently.

vlsi commented 6 days ago

It might be that the overhead of ReentrantLock is much higher with the virtual threads. It looks like we benchmarked regular threads only.

See https://github.com/pgjdbc/pgjdbc/issues/1951#issuecomment-1253521413

@rbygrave, did you benchmark virtual threads + ReentrantLock?

By the way, there's https://openjdk.org/jeps/491, we synchronize would no longer pin the thread, so we might want reverting back to simple synchronization.

davecramer commented 6 days ago

It might be that the overhead of ReentrantLock is much higher with the virtual threads. It looks like we benchmarked regular threads only.

That wouldn't cause it to block though. I presume you are just suggesting we may want to just use synchronized instead

paulaustin-automutatio commented 6 days ago

I would stick with the ReentrantLock, but check to see if there is a requirement for a lock to occur at all. In this case it shouldn't be on getting the value. I'd also confirm why the lock would be needed for setting the values. If it it so the values don't change while you were performing a task I would get those values at the begining of the task as variables and use those.

I did notice some code is repeatedly calling getStandardConformingStrings in the same method. So you have the overhead of that lock multiple times. For example in org.postgresql.core.CachedQueryCreateAction.create(Object). Put all of those into variables at the start of a method to avoid that extra overhead.

I can put this in a separate ticket. But you might want to look at java.util.concurrent.ConcurrentHashMap<K, V> and the computeIfAbsent method for LruCache. I've had good success using that instead of locking for managing a cache of values.

paulaustin-automutatio commented 6 days ago

jep 491 won't be around until JDK 24, so I'd avoid synchronized for now

vlsi commented 6 days ago

I presume you are just suggesting we may want to just use synchronized instead

Based on #1951 benchmarks for the regular threads we concluded that the difference between synchronized and ReentrantLock was negligible, so we went ahead and replaced all the synchronized with ReentrantLock to be sure there's no virtual thread pinning.

If it turns out the overhead is significant, then we could reiterate. For the simple cases like get/set, we could use synchronized even now. The problematic case for the virtual threads is when we call IO while holding synchronized monitor. As long as we are not doing IO, we could use synchronized just fine.

However, before we make any replacements, I would like to get some benchmarks to quantify the change.

vlsi commented 6 days ago

But you might want to look at java.util.concurrent.ConcurrentHashMap<K, V> and the computeIfAbsent method for LruCache

Currently, LruCache is not shared across threads, and different threads should avoid the concurrent use of the same Connection. So I do not see a reason to use ConcurrentHashMap there. There should be absolutely no contention on LruCache right now.

paulaustin-automutatio commented 5 days ago

But you might want to look at java.util.concurrent.ConcurrentHashMap<K, V> and the computeIfAbsent method for LruCache

Currently, LruCache is not shared across threads, and different threads should avoid the concurrent use of the same Connection. So I do not see a reason to use ConcurrentHashMap there. There should be absolutely no contention on LruCache right now.

The only reason I brought it up was that the LruCache is using a ReentrantLock.

pgjdbc / pgjdbc

Avoid locking around get operators that don't return a value #3431