orientechnologies / orientdb

OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries.
https://orientdb.dev
Apache License 2.0
4.74k stars 870 forks source link

Errors updating vertex having EMBEDDEDSET or EMBEDDEDLIST in distributed configuration #7354

Closed lucafrosini closed 3 years ago

lucafrosini commented 7 years ago

OrientDB Version: 2.2.17 - 2.2.18 - 2.2.19 - 2.2.21

Java Version: 1.8.0_121-b13 both on client and server

OS: Ubuntu 17.04 on client and Ubuntu 14.04.5 LTS on server

Expected behavior

I'm running a distributed server composed by 3 nodes with the following configuration:

$ cat orientdb-community/config/default-distributed-db-config.json 
{
  "autoDeploy": true,
  "readQuorum": 1,
  "writeQuorum": "all",
  "executionMode": "undefined",
  "readYourWrites": true,
  "servers": {
    "*": "master"
  },
  "clusters": {
    "internal": {
    },
    "*": {
      "servers": ["<NEW_NODE>"]
    }
  }
}
$ cat orientdb-community/config/orientdb-server-config.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<orient-server>
    <handlers>
        <handler class="com.orientechnologies.orient.graph.handler.OGraphServerHandler">
            <parameters>
                <parameter value="true" name="enabled"/>
                <parameter value="50" name="graph.pool.max"/>
            </parameters>
        </handler>
        <handler class="com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin">
            <parameters>
                <parameter value="true" name="enabled"/>
                <parameter value="orientdb03-d-d4s" name="nodeName"/>
                <parameter value="${ORIENTDB_HOME}/config/default-distributed-db-config.json" name="configuration.db.default"/>
                <parameter value="${ORIENTDB_HOME}/config/hazelcast.xml" name="configuration.hazelcast"/>
            </parameters>
        </handler>
        <handler class="com.orientechnologies.orient.server.handler.OJMXPlugin">
            <parameters>
                <parameter value="false" name="enabled"/>
                <parameter value="true" name="profilerManaged"/>
            </parameters>
        </handler>
        <handler class="com.orientechnologies.orient.server.handler.OAutomaticBackup">
            <parameters>
                <parameter value="true" name="enabled"/>
                <parameter value="${ORIENTDB_HOME}/config/automatic-backup.json" name="config"/>
            </parameters>
        </handler>
        <handler class="com.orientechnologies.orient.server.handler.OServerSideScriptInterpreter">
            <parameters>
                <parameter value="true" name="enabled"/>
                <parameter value="SQL" name="allowedLanguages"/>
            </parameters>
        </handler>
    </handlers>
    <hooks>
<!--  Hooks mainly set lastUpdateTime on Vertex and Edge -->
        <hook class="org.gcube.informationsystem.orientdb.hooks.HeaderHook" position="REGULAR"/>
        <hook class="org.gcube.informationsystem.orientdb.hooks.ConsistsOfHook" position="REGULAR"/>
        <hook class="org.gcube.informationsystem.orientdb.hooks.IsRelatedToHook" position="REGULAR"/>
    </hooks>
    <network>
        <sockets>
            <socket implementation="com.orientechnologies.orient.server.network.OServerTLSSocketFactory" name="ssl">
                <parameters>
                    <parameter value="false" name="network.ssl.clientAuth"/>
                    <parameter value="/etc/pki/jdk/orientdb.jks" name="network.ssl.keyStore"/>
                    <parameter value="changeit" name="network.ssl.keyStorePassword"/>
                    <parameter value="/etc/pki/jdk/orientdb.jks" name="network.ssl.trustStore"/>
                    <parameter value="changeit" name="network.ssl.trustStorePassword"/>
                </parameters>
            </socket>
            <socket implementation="com.orientechnologies.orient.server.network.OServerTLSSocketFactory" name="https">
                <parameters>
                    <parameter value="false" name="network.ssl.clientAuth"/>
                    <parameter value="/etc/pki/jdk/orientdb.jks" name="network.ssl.keyStore"/>
                    <parameter value="changeit" name="network.ssl.keyStorePassword"/>
                    <parameter value="/etc/pki/jdk/orientdb.jks" name="network.ssl.trustStore"/>
                    <parameter value="changeit" name="network.ssl.trustStorePassword"/>
                </parameters>
            </socket>
        </sockets>
        <protocols>
            <protocol implementation="com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary" name="binary"/>
            <protocol implementation="com.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpDb" name="http"/>
        </protocols>
        <listeners>
            <listener protocol="binary" socket="default" port-range="2424-2430" ip-address="0.0.0.0"/>
            <listener protocol="binary" socket="ssl" port-range="2434-2440" ip-address="0.0.0.0"/>
            <listener protocol="http" socket="default" port-range="2480-2490" ip-address="0.0.0.0">
                <commands>
                    <command implementation="com.orientechnologies.orient.server.network.protocol.http.command.get.OServerCommandGetStaticContent" pattern="GET|www GET|studio/ GET| GET|*.htm GET|*.html GET|*.xml GET|*.jpeg GET|*.jpg GET|*.png GET|*.gif GET|*.js GET|*.css GET|*.swf GET|*.ico GET|*.txt GET|*.otf GET|*.pjs GET|*.svg GET|*.json GET|*.woff GET|*.woff2 GET|*.ttf GET|*.svgz" stateful="false">
                        <parameters>
                            <entry value="Cache-Control: no-cache, no-store, max-age=0, must-revalidate\r\nPragma: no-cache" name="http.cache:*.htm *.html"/>
                            <entry value="Cache-Control: max-age=120" name="http.cache:default"/>
                        </parameters>
                    </command>
                    <command implementation="com.orientechnologies.orient.graph.server.command.OServerCommandGetGephi" pattern="GET|gephi/*" stateful="false"/>
                </commands>
                <parameters>
                    <parameter value="utf-8" name="network.http.charset"/>
                    <parameter value="true" name="network.http.jsonResponseError"/>
                </parameters>
            </listener>
        </listeners>
    </network>
    <storages/>
    <users>
        <user resources="*" password="HIDDEN" name="root"/>
    </users>
    <properties>
        <entry value="1" name="db.pool.min"/>
        <entry value="50" name="db.pool.max"/>
        <entry value="true" name="profiler.enabled"/>
        <entry value="0" name="distributed.autoRemoveOfflineServers"/>
        <entry value="/home/orientdb/databases" name="server.database.path"/>
    </properties>
    <isAfterFirstTime>true</isAfterFirstTime>
</orient-server>
$ cat orientdb-community/config/hazelcast.xml
<?xml version="1.0" encoding="UTF-8"?>
<hazelcast
    xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-3.3.xsd"
    xmlns="http://www.hazelcast.com/schema/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <group>
        <name>HIDDEN_NAME</name>
        <password>HIDDEN</password>
    </group>
    <network>
        <port auto-increment="true">2434</port>
        <join>
            <multicast enabled="true">
                <multicast-group>XXX.XXX.XXX.XXX</multicast-group>
                <multicast-port>XXXX</multicast-port>
            </multicast>
        </join>
                <symmetric-encryption enabled="true">
            <algorithm>Blowfish</algorithm>
            <salt>HIDDEN</salt>
            <password>HIDDEN</password>
            <iteration-count>19</iteration-count>
        </symmetric-encryption>
            </network>
    <executor-service>
        <pool-size>16</pool-size>
    </executor-service>
</hazelcast>

Please note that I use partitioned graph so that V extends ORestricted.

I have unexpected behaviour while updating elements containing an EMBEDDEDLIST or EMBEDDESET property, both with schema defined property (with a class registered) or with a instance property not defined in vertex schema.

The first error is The code I use to set/update the property is:

Vertex element = null;
// some code to retrieve the vertex

String key = <property key>; 

List<ODocument> array = getArray(); // The function creates an ArrayList<ODocument>.
((OrientElement) element).setProperty(key, array, OType.EMBEDDEDLIST);
// or 
Set<ODocument> set = getSet(); // The function creates an HashSet<ODocument>.
((OrientElement) element).setProperty(key, set, OType.EMBEDDEDLIST);

// I also experienced the same behaviour with a property defined in the vertex schema.
// In such a case the ArrayList has instance of ODocument  created with the proper type.
// e.g. ODocument("MyClass"); for an EMBEDDEDLIST having 'MyClass' as Linked_Class  
((OrientElement) element).setProperty(key, array);

The code works fine on a single node.

Actual behaviour

The exception I obtain is always something like this

com.orientechnologies.orient.server.distributed.task.ODistributedOperationException: Quorum 3 not reached for request (id=0.140135 task=tx[5]{record_update(#121:12957 v.1),record_update(#163:15658 v.1),record_update(#95:18352 v.1),record_update(#160:24474 v.1),record_update(#90:24682 v.1)} user=#5:7). Elapsed=7ms. Servers in timeout/conflict are:\n - orientdb01-d-d4s: TX[5]{2,2,2,2,2}\nReceived: \n - orientdb01-d-d4s: TX[5]{2,2,2,2,2}\n - orientdb02-d-d4s: TX[5]{2,2,2,2,1}\n - orientdb03-d-d4s: TX[5]{2,2,2,2,1}\r\n\tDB name=\"gcube\"\r\n\tDB name=\"gcube\""

Please note that cluster #90 (#90:2468 instance contains an embedded set).

Please note that I have just one client writing on DB.

Please also note that I have the same exception even I don't modify the property but I grant rights to another role.

lucafrosini commented 7 years ago

I have created a project which contains a class to reproduce the issue.

You can find the find the the project at https://github.com/lucafrosini/orientdb-bug-reports

Please note thant in the class ReproduceBug7354 you have to change

private static final String HOST = "remote:node01.acme.org;node02.acme.org;node03.acme.org";

and

private static final String ROOT_PASSWORD = "ROOT_PWD";

with the correct values for your test environment.

In resources folder you can find the file Bug7354_StackTrace.txt with the exception stacktrace.

Please note that I changed the names of the nodes to obfuscate my server names

 - node03: TX[1]{1}
Received: 
 - node01: TX[1]{2}
 - node02: TX[1]{2}
 - node03: TX[1]{1}

Hope this helps

lucafrosini commented 7 years ago

I upgraded to version 2.2.19 and seem that this version is not affected by this behaviour, or at least the test I provided does not reveal it.

lvca commented 7 years ago

Cool, closing the issue then.

lucafrosini commented 7 years ago

Unfortunately I'm still obtaining such an issue in my real scenario. I'll made more test and I'll inform you about this.

lucafrosini commented 7 years ago

When I run the test I provided the first time, the servers where just started and there where no load on them. Now I have ran the tests with the server under load (working in another database) and the behaviour appears again.

So the test is still valid but you need to have the servers under stress.

Please note that I spent 3 week to isolate the problem and realizing that I always obtain such a behaviour with embeddelist. As counterpart prove I run my real case scenario commenting the code generating embeddedlist and even under stress I never get the issue.

lucafrosini commented 7 years ago

Looking with studio the vertex I created with a property as embeddedlist I get the type of the property as Embedded. Then trying to change the proprty type as embeddelist, If I reload the page I get embeddedset as type.

I don't know if this is related and can helps. screenshot from 2017-04-27 17-29-07 screenshot from 2017-04-27 17-31-13

lucafrosini commented 7 years ago

Did you made any tests?

lucafrosini commented 7 years ago

I also made some tests with two nodes only and I obtain the same behaviour

lucafrosini commented 7 years ago

To avoid mistake can you please remove "Cannot Replicate" tag. I provided the tests and the conditions to reproduce it.

lvca commented 7 years ago

@lucafrosini we fixed many issues on HA in 2.2.21. Could you please retry with the latest one and in case the issue is still there the team will be assigned to that tomorrow? Thanks.

lucafrosini commented 7 years ago

Sorry @lvca, I lost your message. I'm going to test it. Thanks a lot

lucafrosini commented 7 years ago

The simple tests I provided (upgraded to 2.2.21) at https://github.com/lucafrosini/orientdb-bug-reports now does not fails. Unfortunately I still have the error in my real environment (upgraded to 2.2.21 both client and server).

The bug happens every time in a transaction involving the updates of different vertex and edges on the element having an EMBEDDELIST or EMBEDDEDSET.

I'm attaching the exception I get on my orientdb client client_exception.txt

and the WARNING I have on orientdb.log

server_warning.txt

I hope this helps.

In the next days I'll try to update the simple test at https://github.com/lucafrosini/orientdb-bug-reports