Closed bulanan closed 3 years ago
Hi @bulanan, thank you for the report. Can you by chance to provide Java application which you used to test the system?
Hi @laa, The application code is very large and complicated so I can't provide it here, but basically what it does - It's a web application.
I can try to create a simpler test to reproduce it.
Hi @laa
The below test reproduces the problem. a short description of the test: it runs for around 15 min. every second it creates a new element in myclass table. at the same time there's a live query on this table and there's an observable (we use reactive java - rxjava2) that emits each time the live query emits a value. I surrounded by try-catch the part of creating a record. during the running of this test I run "docker-compose stop odb1" followed by "docker-compose start odb1" I chose the node with most connections. please see down below the output after restarting the node.
import com.orientechnologies.common.exception.OException;
import com.orientechnologies.common.serialization.types.OBinaryTypeSerializer;
import com.orientechnologies.orient.core.command.OCommandExecutor;
import com.orientechnologies.orient.core.command.OCommandRequestText;
import com.orientechnologies.orient.core.db.*;
import com.orientechnologies.orient.core.db.document.ODatabaseDocument;
import com.orientechnologies.orient.core.index.ORuntimeKeyIndexDefinition;
import com.orientechnologies.orient.core.metadata.schema.OClass;
import com.orientechnologies.orient.core.metadata.schema.OType;
import com.orientechnologies.orient.core.metadata.sequence.OSequence;
import com.orientechnologies.orient.core.record.OElement;
import com.orientechnologies.orient.core.record.impl.ODocument;
import com.orientechnologies.orient.core.sql.executor.OResult;
import io.reactivex.Completable;
import io.reactivex.Observable;
import io.reactivex.functions.Consumer;
import io.reactivex.observers.BaseTestConsumer;
import io.reactivex.observers.TestObserver;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
@Before
public void setUp() {
if (dbClient != null) {
dbClient.close();
}
dbClient = new OrientDB(dbUrlRemote, "root", "root", OrientDBConfig.defaultConfig());
if (!dbClient.exists(dbName)) {
dbClient.createIfNotExists(dbName, ODatabaseType.PLOCAL);
}
}
@After
public void tearDown() {
if (dbClient != null) {
System.out.println("dropping db: "+dbName);
dbClient.drop(dbName);
dbClient.close();
dbClient = null;
}
}
@Test
public void testLiveQuery() throws Exception {
try (ODatabaseSession session = dbClient.open(dbName, "admin", "admin")) {
OClass oClass = session.createClass("MyClass");
oClass.createProperty("num", OType.LONG);
Observable<OElement> elements = Observable.create(emitter -> {
OLiveQueryMonitor monitor = session.live("select from MyClass", new OLiveQueryResultListener() {
@Override
public void onCreate(ODatabaseDocument database, OResult data) {
emitter.onNext(data.toElement());
}
@Override
public void onUpdate(ODatabaseDocument database, OResult before, OResult after) {
emitter.onNext(after.toElement());
}
@Override
public void onDelete(ODatabaseDocument database, OResult data) {
}
@Override
public void onError(ODatabaseDocument database, OException exception) {
emitter.onError(exception);
}
@Override
public void onEnd(ODatabaseDocument database) {
emitter.onComplete();
}
});
emitter.setCancellable(monitor::unSubscribe);
});
TestObserver<OElement> testObserver = elements
.doOnNext((x)-> System.out.println("live query emitted value: " +x))
.doOnError((x)-> System.out.println("got error in live query: "+x))
.test();
session.begin();
for (int i = 1;i<1002;i++) {
try {
OElement element = session.newElement("MyClass");
element.setProperty("num", i);
element.save();
session.commit();
Thread.sleep(1000);
System.out.println("saved element");
} catch (Exception e) {
System.out.println(e.getMessage());
System.out.println(e.getCause());
}
}
testObserver.awaitCount(1000, BaseTestConsumer.TestWaitStrategy.SLEEP_10MS, 1000*60*15)
.assertValueCount(1000)
.assertValueAt(0, e -> e.getProperty("num").equals(1L));
Completable.fromAction(testObserver::dispose)
.blockingAwait();
}
}
output before restarting the node:
saved element
live query emitted value: MyClass#28:15{num:46} v1
saved element
live query emitted value: MyClass#30:15{num:47} v1
saved element
live query emitted value: MyClass#31:15{num:48} v1
saved element
live query emitted value: MyClass#28:16{num:49} v1
saved element
live query emitted value: MyClass#30:16{num:50} v1
saved element
live query emitted value: MyClass#31:16{num:51} v1
saved element
live query emitted value: MyClass#28:17{num:52} v1
saved element
live query emitted value: MyClass#30:17{num:53} v1
output after restarting the node , we can see that there was a problem in saving a record and the live query stops emitting values:
Jun 28, 2020 11:19:55 AM com.orientechnologies.common.log.OLogManager log
INFO: Caught Network I/O errors on 10.55.136.177:2424/testDb, trying an automatic reconnection... (error: null)
Error during saving of record with rid #-1:-1
DB name="testDb"
com.orientechnologies.common.io.OIOException
got error in live query: com.orientechnologies.orient.core.exception.ODatabaseException: Live query disconnection
Error during saving of record with rid #-1:-1
DB name="testDb"
com.orientechnologies.common.io.OIOException: Error on connecting to 172.25.0.2:2424/testDb
saved element
saved element
saved element
saved element
saved element
saved element
saved element
saved element
Hi @laa, any update on this?
OrientDB Version: 3.0.30
Java Version: 1.8
OS: official dockers containers running on Linux Centos
Expected behavior
running a setup of 3 nodes master-master newNodeStrategy dynamic and running our application with this database, when I stop one of the containers, the one with most connections and bring it back up, I expect that the application will continue to work as normal, and the connection on the live query will remain.
Actual behavior
the application gets a lot of "live query disconnection" exceptions and the app can't overcome this, also I get the impression that the data is inconsistent.
I'd like to know if there is something wrong with the configuration? is it a bug?, and if not how would you define "high availability"?
Steps to reproduce
1.set up of 3 nodes (configuration + docker-compose attached below) 2.run an up (using java API) that connects to the cluster load data and perform live queries on it 3.stop one node 4.start the node node
[default-distributed-db-config.txt.txt](https://github.com/orientechnologies/orientdb/files/4825707/default-distributed-db-
docker-compose.zip
config.txt.txt)