Open gberche-orange opened 4 years ago
Bump to https://github.com/cloudfoundry/cf-java-client/releases/tag/v4.16.0.RELEASE , see https://github.com/cloudfoundry/cf-java-client/pull/1118
Previously when a 401 would occur, we would throw the InvalidTokenException which in turn would trigger a retry. The retry would allow the TokenProvider to fetch a new, possibly valid token. When this happened, you could get into a state where reactor-netty was waiting for the library to finish using the connection (either consume the body or dispose it). Since we never did that, you could accumulate connections in the ESTABLISHED state. This code change will read and throw away the body when there's a 401, which signals to reactor-netty that it can reuse the connection.
No yet evidence that we received 401 though.
Expected behavior
As an osb-cmdb operator
Observed behavior
reactor assembly trace thread contention
In the stack trace available at https://gist.github.com/gberche-orange/efcbedcd9f5c715dcd5e1505eb5503cd
46 thread out of 47 threads are waiting on the reactor assembly at https://github.com/reactor/reactor-core/blob/ef5d9c0d0f1b65b6669273a1ac651d2d6a019c15/reactor-core/src/main/java/reactor/core/publisher/FluxOnAssembly.java#L262 with stack trace:
Surprisingly, no thread is observed to currently lock the LinkedList
cf-java-client high contention & completion times
Some cf-java-client calls take more than 60 mins to complete, see below queries before and after JVM restarts
reactor stack trace keep trashing
Apparent signs of reactive loops when handling uaa error, see fragment below, and larger fragment in gist
https://gist.github.com/gberche-orange/3282249f8d5d08a1ab35b7bc5a5488a0
Need to further study https://projectreactor.io/docs/core/release/reference/#debug-activate
logs are lacking thread names
making it hard to distinguish concurrent threads dumping the same entries such as:
Affected release
Reproduced on version 1.2.0