Closed conker84 closed 3 years ago
there are two problems here:
systemdb
can have a LEADER
in a different instance where the actual call apoc.trigger.add
happened. We have a couple of options here but I need to discuss them with @fbiville @JMHReif & @jexp and will be managed via #2194
Pasted form internal trello card
Neo4j Enterprise 4.2.9 APOC 4.2.0.2
Customer reported and I have since tested what seems to be a failure to replicate triggers created via apoc.trigger.add(). I’m aware of a bug in apoc that throws a server no longer accepts writes error unless you connect directly via bolt:// to the System db leader and add the trigger there (related cards and GH issue attached). But now, that trigger doesn’t appear anywhere else on the cluster, apart from that one core that I created it on. Is raft somehow being selective in what transactions its replicating across to members?? Since if I repeat the exercise with a simple create/merge node (also having connected directly via bolt to a user db) the merged node appears on all cores straightaway. I don’t think this is down to replication of updates in the system db, since other updates to system are replicated just fine.
So the question is: What could possibly be preventing that trigger from appearing on nodes other than the one it was directly created on (system leader) using bolt://??
Interestingly, the trigger appears just fine on all cores, on a 3.5.29 cluster, which adds to my suspicion about how the System db updates in 4.x are being replicated, differently, to how graph.db updates were handled in 3.5.x
Attaching logs and relevant files from both test clusters (4.2.9 and 3.5.29). Please let me know if any additional details are required.
Repro steps:
1- connect via neo4j:// to a 4.x cluster (any 4.x version). Ensure beforehand to have the appropriate apoc plugin under /plugins directory and to whitelist apoc as follows in neo4j.conf:
2- run
CALL apoc.trigger.add("trigger1", "MERGE (p:Person {name: 'invalid'})",{phase:'before'})
unless one connects to instance that just happens to be the system db leader at the time, the following error will be thrown:
Server at mydomainl:7687 no longer accepts writes
This is a known bug with apoc, reported on the attached cards/GH.
3- CALL dbms.cluster.overview() and identify leader for the system db. Then connect via bolt:// directly (either via browser or cypher-shell) to that instance.
4- run
CALL apoc.trigger.add("trigger1", "MERGE (p:Person {name: 'invalid'})",{phase:'before'})
. This succeeds and one can view trigger1 as a listed trigger via call apoc.trigger.list().5- Connect to any other core via bolt:// that is NOT the leader for system db and execute
call apoc.trigger.run
. Result: Trigger1 is NOT listed there, or on any other cluster members. This appears just fine on a 3.5.x cluster, but not in 4.x.Thanks