vert-x3 / vertx-ignite

Apache License 2.0
35 stars 28 forks source link

Ignite 2.11.0 IgniteFuture never completes - AsyncMap operations never complete #121

Closed DemonicTutor closed 2 years ago

DemonicTutor commented 2 years ago

Version

VertX 4.2.0

Context

After upgrading our microservices from VertX 4.1.0 to 4.2.0 we discovered in our clustered services HTTP calls which require map.rxGet() never complete. There are no exceptions or any other hints.

The offending change is in Ignite 2.11.0 - they no longer execute the cache operations (futures processing the cache return) on their internal stripe threadpool but by default its now on the JDK's ForkJoinPool. as stated here: https://ignite.apache.org/docs/latest/key-value-api/basic-cache-operations#asynchronous-execution

If running VertX in a environment with 1/2 CPU's this causes the Futures to never complete.

I could not yet fully trace down why it happens but switching Ignite back to the previous behaviour fixed it for us.

Do you have a reproducer?

https://github.com/DemonicTutor/vertx-ignite-deadlock

Steps to reproduce

  1. Limit CPUs (eg docker --cpus 1)
  2. First clustered VertX instance putting a value
  3. Second clustered VertX instance getting the value

Extra

DemonicTutor commented 2 years ago

its a JDK bug that the ForkJoin pool never creates a worker: https://github.com/adoptium/jdk17u/commit/328358a4e57dc0e3fec4d518d444bd91269f198e

but still i think its a good idea to keep the cache asyncs on the ignite threadpool and avoid submitting to ForkJoinPool

zyclonite commented 2 years ago

nice find, thank you

seems this change was not mentioned in the release notes for 2.11.0 -> https://github.com/apache/ignite/commit/38d279afd18f21e6d26a6f3730999600372e039f

in your #122, you will fall back on the ignite striped pool which might run into they same issue the ignite guys tried to fix... do you think it would make more sense to use the vertx threadpool? e.g. cfg.setAsyncContinuationExecutor(vertx.getContext().workerPool().executor());

DemonicTutor commented 2 years ago

Yes i checked the release notes and didnt find it there. a coworker actually pointed out this page in the ignite documentation to me.

Calling synchronous cache and compute operations from inside the callback may lead to a deadlock due to pools starvation.

From a point of view from VertX (vertx-ignite specifically) it is only about providing a implementation for the Clustered interface providing specific capabilities. None of these provide a way to go from the ignite cache into a clustered computation.

Therefore i would argue avoiding an additional threadpool and instead stick with ignite is appropriate here.

Also: Ignite uses Unsafe.park in their Future implementation to wait for the result if it has to be moved from a other node. I think this is not a good idea to do this on any of Vertx's threads.

alternatively it would be an option to add this to the IgniteOptions.

zyclonite commented 2 years ago

right, in this case switching back to the previous behavior makes sense