OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries.
No proper High availability on a distributed setup #9311

Closed bulanan closed 3 years ago

bulanan commented 4 years ago

OrientDB Version: 3.0.30

Java Version: 1.8

OS: official dockers containers running on Linux Centos

Expected behavior

running a setup of 3 nodes master-master newNodeStrategy dynamic and running our application with this database, when I stop one of the containers, the one with most connections and bring it back up, I expect that the application will continue to work as normal, and the connection on the live query will remain.

Actual behavior

the application gets a lot of "live query disconnection" exceptions and the app can't overcome this, also I get the impression that the data is inconsistent.

I'd like to know if there is something wrong with the configuration? is it a bug?, and if not how would you define "high availability"?

Steps to reproduce

1.set up of 3 nodes (configuration + docker-compose attached below) 2.run an up (using java API) that connects to the cluster load data and perform live queries on it 3.stop one node 4.start the node node




andrii0lomakin commented 4 years ago

Hi @bulanan, thank you for the report. Can you by chance to provide Java application which you used to test the system?

bulanan commented 4 years ago

Hi @laa, The application code is very large and complicated so I can't provide it here, but basically what it does - It's a web application.

  1. Inserts/updates/deletes records every second
  2. Opens live query sessions with queries over the objects in the system (can be several live queries at a time, around 20 ) and observe those for as long as a user is visiting a screen. 3.if a user leaves the screen the live query of that screen unsubscribes

I can try to create a simpler test to reproduce it.

bulanan commented 4 years ago

Hi @laa

The below test reproduces the problem. a short description of the test: it runs for around 15 min. every second it creates a new element in myclass table. at the same time there's a live query on this table and there's an observable (we use reactive java - rxjava2) that emits each time the live query emits a value. I surrounded by try-catch the part of creating a record. during the running of this test I run "docker-compose stop odb1" followed by "docker-compose start odb1" I chose the node with most connections. please see down below the output after restarting the node.

    public void setUp() {
        if (dbClient != null) {
        dbClient = new OrientDB(dbUrlRemote, "root", "root", OrientDBConfig.defaultConfig());

        if (!dbClient.exists(dbName)) {
            dbClient.createIfNotExists(dbName, ODatabaseType.PLOCAL);

    public void tearDown() {
        if (dbClient != null) {
            System.out.println("dropping db: "+dbName);
            dbClient = null;

    public void testLiveQuery() throws Exception {
        try (ODatabaseSession session = dbClient.open(dbName, "admin", "admin")) {
            OClass oClass = session.createClass("MyClass");
            oClass.createProperty("num", OType.LONG);
            Observable<OElement> elements = Observable.create(emitter -> {
                OLiveQueryMonitor monitor = session.live("select from MyClass", new OLiveQueryResultListener() {
                    public void onCreate(ODatabaseDocument database, OResult data) {

                    public void onUpdate(ODatabaseDocument database, OResult before, OResult after) {

                    public void onDelete(ODatabaseDocument database, OResult data) {


                    public void onError(ODatabaseDocument database, OException exception) {

                    public void onEnd(ODatabaseDocument database) {
            TestObserver<OElement> testObserver = elements
                    .doOnNext((x)-> System.out.println("live query emitted value: " +x))
                    .doOnError((x)-> System.out.println("got error in live query: "+x))
            for (int i = 1;i<1002;i++) {
                try {
                    OElement element = session.newElement("MyClass");
                    element.setProperty("num", i);
                    System.out.println("saved element");
                } catch (Exception e) {

            testObserver.awaitCount(1000, BaseTestConsumer.TestWaitStrategy.SLEEP_10MS, 1000*60*15)
                    .assertValueAt(0, e -> e.getProperty("num").equals(1L));


output before restarting the node:

output after restarting the node , we can see that there was a problem in saving a record and the live query stops emitting values:

Jun 28, 2020 11:19:55 AM com.orientechnologies.common.log.OLogManager log
INFO: Caught Network I/O errors on, trying an automatic reconnection... (error: null)
Error during saving of record with rid #-1:-1
    DB name="testDb"
got error in live query: com.orientechnologies.orient.core.exception.ODatabaseException: Live query disconnection 
Error during saving of record with rid #-1:-1
    DB name="testDb"
com.orientechnologies.common.io.OIOException: Error on connecting to
bulanan commented 4 years ago

Hi @laa, any update on this?