Closed ikwattro closed 1 year ago
:+1:
I can confirm the issue. The error returned, after the timeout, seems to be random. I created unit test, with several different ways how the query can be run, with implicit and explicit transaction. Very often the query doesn't fail with expected Neo.ClientError.Transaction.TransactionTimedOut
, but with different error e.g. Neo.ClientError.Request.Invalid
with message e.g. Message 'PULL Map{qid -> Long(1), n -> Long(700)}' cannot be handled by a session in the READY state.
@Slf4j
public class TimeoutInTxTest {
private static final String URL = "neo4j://........";
private static final String DATABASE = ".......";
private static final AuthToken AUTH = AuthTokens.basic(System.getenv("NEO4J_USER"), System.getenv("NEO4J_PASSWORD"));
private static final int TIMEOUT = 3_000;
private static final String TIMEOUT_ERROR_CODE = "Neo.ClientError.Transaction.TransactionTimedOut";
// Test fails almost every time with error code: Neo.ClientError.Request.Invalid
@RepeatedTest(value = 7)
public void timeout_for_explicit_tx_1() {
try (Driver driver = GraphDatabase.driver(URL, AUTH)) {
try (Session session = driver.session(SessionConfig.builder().withDatabase(DATABASE).build())) {
TransactionConfig config = TransactionConfig.builder()
.withTimeout(Duration.of(TIMEOUT, ChronoUnit.MILLIS))
.build();
//session.run("create (p:Test) return p");
try (Transaction tx = session.beginTransaction(config)) {
tx.run(format("CALL apoc.util.sleep(%d)", 2 * TIMEOUT));
tx.commit();
}
Assertions.fail("Should fail with timeout");
}
} catch (ClientException ex) {
log.error("{}, {}", ex.code(), ex.getMessage());
assertThat(ex.code()).isEqualTo(TIMEOUT_ERROR_CODE);
}
}
// This test fails every 2nd time (with Neo.ClientError.Request.Invalid)
@RepeatedTest(value = 7)
public void timeout_for_explicit_tx_2() {
try (Driver driver = GraphDatabase.driver(URL, AUTH)) {
try (Session session = driver.session(SessionConfig.builder().withDatabase(DATABASE).build())) {
TransactionConfig config = TransactionConfig.builder()
.withTimeout(Duration.of(TIMEOUT, ChronoUnit.MILLIS))
.build();
session.run("create (p:Test) return p");
try (Transaction tx = session.beginTransaction(config)) {
tx.run(format("CALL apoc.util.sleep(%d)", 2 * TIMEOUT));
tx.commit();
}
Assertions.fail("Should fail with timeout");
}
} catch (ClientException ex) {
log.error("{}, {}", ex.code(), ex.getMessage());
assertThat(ex.code()).isEqualTo(TIMEOUT_ERROR_CODE);
}
}
// This test fails every 2nd time (with Neo.ClientError.Request.Invalid)
@RepeatedTest(value = 7)
public void timeout_for_explicit_tx_3() {
try (Driver driver = GraphDatabase.driver(URL, AUTH)) {
try (Session session = driver.session(SessionConfig.builder().withDatabase(DATABASE).build())) {
TransactionConfig config = TransactionConfig.builder()
.withTimeout(Duration.of(TIMEOUT, ChronoUnit.MILLIS))
.build();
session.run("create (p:Test) return p");
try (Transaction tx = session.beginTransaction(config)) {
tx.run("UNWIND range(0,1000000) AS i RETURN i").stream().forEach(r -> {});
tx.commit();
}
Assertions.fail("Should fail with timeout");
}
} catch (ClientException ex) {
log.error("{}, {}", ex.code(), ex.getMessage());
assertThat(ex.code()).isEqualTo(TIMEOUT_ERROR_CODE);
}
}
// This test fails every 2nd time (with Neo.ClientError.Request.Invalid)
@RepeatedTest(value = 7)
public void timeout_for_explicit_tx_4() {
try (Driver driver = GraphDatabase.driver(URL, AUTH)) {
try (Session session = driver.session(SessionConfig.builder().withDatabase(DATABASE).build())) {
TransactionConfig config = TransactionConfig.builder()
.withTimeout(Duration.of(TIMEOUT, ChronoUnit.MILLIS))
.build();
try (Transaction tx = session.beginTransaction(config)) {
tx.run("create (p:Test) return p");
tx.run("UNWIND range(0,1000000) AS i RETURN i").stream().forEach(r -> {});
tx.commit();
}
Assertions.fail("Should fail with timeout");
}
} catch (ClientException ex) {
log.error("{}, {}", ex.code(), ex.getMessage());
assertThat(ex.code()).isEqualTo(TIMEOUT_ERROR_CODE);
}
}
// This test always fail with Neo.ClientError.Request.Invalid
@RepeatedTest(value = 7)
public void timeout_for_implicit_tx_1() {
try (Driver driver = GraphDatabase.driver(URL, AUTH)) {
try (Session session = driver.session(SessionConfig.builder().withDatabase(DATABASE).build())) {
TransactionConfig config = TransactionConfig.builder()
.withTimeout(Duration.of(TIMEOUT, ChronoUnit.MILLIS))
.build();
// session.run("create (p:Test) return p");
session.executeWriteWithoutResult(ctx -> {
ctx.run(format("CALL apoc.util.sleep(%d)", 2 * TIMEOUT));
}, config);
Assertions.fail("Should fail with timeout");
}
} catch (ClientException ex) {
log.error("{}, {}", ex.code(), ex.getMessage());
assertThat(ex.code()).isEqualTo(TIMEOUT_ERROR_CODE);
}
}
// This test fails every 2nd time (with Neo.ClientError.Request.Invalid)
@RepeatedTest(value = 7)
public void timeout_for_implicit_tx_2() {
try (Driver driver = GraphDatabase.driver(URL, AUTH)) {
try (Session session = driver.session(SessionConfig.builder().withDatabase(DATABASE).build())) {
TransactionConfig config = TransactionConfig.builder()
.withTimeout(Duration.of(TIMEOUT, ChronoUnit.MILLIS))
.build();
session.run("create (p:Test) return p");
session.executeWriteWithoutResult(ctx -> {
ctx.run(format("CALL apoc.util.sleep(%d)", 2 * TIMEOUT));
}, config);
Assertions.fail("Should fail with timeout");
}
} catch (ClientException ex) {
log.error("{}, {}", ex.code(), ex.getMessage());
assertThat(ex.code()).isEqualTo(TIMEOUT_ERROR_CODE);
}
}
}
Do you think it may be an issue with bolt protocol?
@ikwattro, thanks for reporting the issue. Unfortunately, I am struggling to reproduce the following error:
org.neo4j.driver.exceptions.ClientException: Message 'PULL Map{qid -> Long(0), n -> Long(1000)}' cannot be handled by a session in the READY state.
So far, I have setup a 3 node cluster using the neo4j:4.4.14-enterprise
Docker image with SSR enabled. I tried hitting both the writer and the readers. It feels like I might be missing some crucial detail to reproduce.
@grzegorz-aniol, thanks for the tests. 👍
For some reason, I am getting different results:
timeout_for_explicit_tx_1
- all passedtimeout_for_explicit_tx_2
- all passedtimeout_for_explicit_tx_3
- all passedtimeout_for_explicit_tx_4
- all passedtimeout_for_implicit_tx_1
- all passedtimeout_for_implicit_tx_2
- all passedIt is probably worth noting I had console logging enabled as I was hoping to catch some network exchanges.
It would be good to establish how the Message 'PULL Map{qid -> Long(0), n -> Long(1000)}' cannot be handled
error happens. If it is easy enough to reproduce, could you enable the driver debug logging (example: Config.builder().withLogging(Logging.console(Level.FINE)).build()
) and share the log please?
I've been running these days exactly same unit tests, few times, and I cannot reproduce this issue anymore. What is strange, the cluster version is the same. I also tried with different driver versions (as described by @ikwattro), with same result.
Lets close this one. Please open a new issue or re-open this one if it becomes reproducible again.
We have the following piece of driver usage :
One time on two, we receive different error messages
One message which makes sense :
The second is a bit more cryptic
The query is executed against a replica or a core ( result is same ) of a Neo4j 4.4.14 causal cluster ( 3 cores + 1 replica with SSR enabled ).
The following
neo4j-java-driver
versions produce the same result :Is it something expected or is our piece of logic wrong ?