orientechnologies / orientdb

OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries.
https://orientdb.dev
Apache License 2.0
4.76k stars 871 forks source link

OrientDB looses data when we restart cluster after some nodes crashed #6999

Closed pavlop-asg closed 7 years ago

pavlop-asg commented 7 years ago

OrientDB Version, operating system, or hardware.

Operating System

Expected behavior and actual behavior

There is an example. We have node A and B, we write a value "1" to a document when node A and B are up. We shut down node B and write value "2", after we shut down node A as well. When we start node B and after node A - cluster will have old value "1" instead of "2", looks like there is some bug when OrientDB merges the records. NOTE: when we start node A and after node B - everything works as expected.

Steps to reproduce the problem

Start two nodes with the following distributed configuration:

{
  "autoDeploy": true,
  "readQuorum": 1,
  "writeQuorum": 1,
  "executionMode": "undefined",
  "readYourWrites": true,
  "newNodeStrategy": "static",
  "servers": {
    "*": "master"
  },
  "clusters": {
    "internal": {
    },
    "*": {
      "servers": ["<NEW_NODE>"]
    }
  }
}

This issue is applicable for other configurations as well, for example:

lvca commented 7 years ago

I suggest you to upgrade to the last 2.2.13 where many bugs on distributed have been already fixed.

pavlop-asg commented 7 years ago

Sorry, was a typo, we use v2.2.13

lvca commented 7 years ago

We improved the merge strategy in 2.2.x branch. There are still some corner case where the merge is not done, but we will finalize it in the following days. However, if you could try 2.2.x, your feedback would be valuable before we release the next hotfix.

pavlop-asg commented 7 years ago

I have tested with the latest v2.2.13 version. May I clarify, is it an OrientDB issue or maybe I can somehow change the configuration to make it working?

pavlop-asg commented 7 years ago

Is this issue going to be resolved in the next hotfix? Just want to clarify the expectation, because we have a release very soon and it is critical for us.

lvca commented 7 years ago

This issue has been fixed in last 2.2.x. Any chance to test it?

pavlop-asg commented 7 years ago

Which version should I test, the latest snapshot?

lvca commented 7 years ago

You can it by your own or use last snapshot just published: https://oss.sonatype.org/content/repositories/snapshots/com/orientechnologies/orientdb-community/2.2.14-SNAPSHOT/orientdb-community-2.2.14-20161221.214344-76.tar.gz

pavlop-asg commented 7 years ago

I have tested 2.2.14 version and the bug is still here.

lvca commented 7 years ago

If you tried the same wit 2.2.14, could you please publish here the content of the 2 files distributed-sync.json?

pavlop-asg commented 7 years ago
{
    "@type": "d",
    "@version": 0,
    "version": 30,
    "autoDeploy": true,
    "readQuorum": 1,
    "writeQuorum": 1,
    "executionMode": "undefined",
    "readYourWrites": true,
    "newNodeStrategy": "dynamic",
    "servers": {
        "@type": "d",
        "@version": 0,
        "*": "master"
    },
    "clusters": {
        "@type": "d",
        "@version": 0,
        "internal": {
            "@type": "d",
            "@version": 0
        },
        "*": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-1",
                "node-2",
                "<NEW_NODE>"
            ]
        },
        "odocumentwrapper_2": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "odocumentwrapper": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "aclentity": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "aclentity_1": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "ofunction_0": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "configurationentry_3": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "configurationentry": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "v": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "v_3": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "principalentity": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "principalentity_1": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "roleentity": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "roleentity_1": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "rulemodelentity_1": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "rulemodelentity_3": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "favoriteentity_3": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "favoriteentity": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "extractfilterentity_1": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "extractfilterentity_2": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "e": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "e_1": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "osequence_0": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "orole_0": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "savedsearchentity_3": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "savedsearchentity": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "ouser_0": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "oschedule_0": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        }
    }
}
lvca commented 7 years ago

Not the distributed-config.json, but the distributed-sync.json.

pavlop-asg commented 7 years ago
{
    "@type": "d",
    "@version": 0,
    "version": 57,
    "node-1": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 8726,
        "@fieldTypes": "segment=l,position=l"
    },
    "lastOperationTimeStamp": 1482505984716,
    "node-2": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 3552,
        "@fieldTypes": "segment=l,position=l"
    },
    "@fieldTypes": "lastOperationTimeStamp=l"
}
lvca commented 7 years ago

Could you please provide the files from other servers too, so I can compare what's wrong?

pavlop-asg commented 7 years ago

I have repeated the test and attached distributed-sync.json files and logs.

  1. Start node-1 distributed-sync.json files: node-1:

    {
    "@type": "d",
    "@version": 0,
    "lastOperationTimeStamp": 1483285391341,
    "version": 2,
    "node-1": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 102,
        "@fieldTypes": "segment=l,position=l"
    },
    "@fieldTypes": "lastOperationTimeStamp=l"
    }

    node-2: empty

  2. Start node-2 distributed-sync.json files: node-1: hasn't changed node-2:

    {
    "@type": "d",
    "@version": 0,
    "lastOperationTimeStamp": 1483285599260,
    "version": 8,
    "node-1": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 102,
        "@fieldTypes": "segment=l,position=l"
    },
    "node-2": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 86,
        "@fieldTypes": "segment=l,position=l"
    },
    "@fieldTypes": "lastOperationTimeStamp=l"
    }
  3. Write value "1" -> it's replicated and accessible on both nodes distributed-sync.json files: node-1:

    {
    "@type": "d",
    "@version": 0,
    "lastOperationTimeStamp": 1483286104089,
    "version": 6,
    "node-1": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 959,
        "@fieldTypes": "segment=l,position=l"
    },
    "@fieldTypes": "lastOperationTimeStamp=l"
    }

    node-2:

    {
    "@type": "d",
    "@version": 0,
    "lastOperationTimeStamp": 1483286104177,
    "version": 11,
    "node-1": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 227,
        "@fieldTypes": "segment=l,position=l"
    },
    "node-2": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 86,
        "@fieldTypes": "segment=l,position=l"
    },
    "@fieldTypes": "lastOperationTimeStamp=l"
    }
  4. Shutdown node-1 distributed-sync.json files: node-1:

    {
    "@type": "d",
    "@version": 0,
    "lastOperationTimeStamp": 1483286104089,
    "version": 8,
    "node-1": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 959,
        "@fieldTypes": "segment=l,position=l"
    },
    "@fieldTypes": "lastOperationTimeStamp=l"
    }

    node-2:

    {
    "@type": "d",
    "@version": 0,
    "lastOperationTimeStamp": 1483286104177,
    "version": 13,
    "node-1": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 227,
        "@fieldTypes": "segment=l,position=l"
    },
    "node-2": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 86,
        "@fieldTypes": "segment=l,position=l"
    },
    "@fieldTypes": "lastOperationTimeStamp=l"
    }
  5. Write value "2" -> it's saved only on node-1 since node-2 is down, node-2 has value "1" in the file system distributed-sync.json files: node-1: hasn't changed node-2:

    {
    "@type": "d",
    "@version": 0,
    "lastOperationTimeStamp": 1483286551982,
    "version": 15,
    "node-1": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 227,
        "@fieldTypes": "segment=l,position=l"
    },
    "node-2": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 1691,
        "@fieldTypes": "segment=l,position=l"
    },
    "@fieldTypes": "lastOperationTimeStamp=l"
    }
  6. Shutdown node-2 distributed-sync.json files: node-1: hasn't changed node-2:

    {
    "@type": "d",
    "@version": 0,
    "lastOperationTimeStamp": 1483286551982,
    "version": 17,
    "node-1": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 227,
        "@fieldTypes": "segment=l,position=l"
    },
    "node-2": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 1691,
        "@fieldTypes": "segment=l,position=l"
    },
    "@fieldTypes": "lastOperationTimeStamp=l"
    }
  7. Start node-1 distributed-sync.json files: node-1:

    {
    "@type": "d",
    "@version": 0,
    "lastOperationTimeStamp": 1483286746497,
    "version": 10,
    "node-1": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 1102,
        "@fieldTypes": "segment=l,position=l"
    },
    "@fieldTypes": "lastOperationTimeStamp=l"
    }

    node-2: hasn't changed

  8. Start node-2 -> here data is lost, value "2" is getting overwritten by value "1" distributed-sync.json files: node-1: hasn't changed node-2:

    {
    "@type": "d",
    "@version": 0,
    "lastOperationTimeStamp": 1483286551982,
    "version": 20,
    "node-1": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 227,
        "@fieldTypes": "segment=l,position=l"
    },
    "node-2": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 2485,
        "@fieldTypes": "segment=l,position=l"
    },
    "@fieldTypes": "lastOperationTimeStamp=l"
    }

Log files when we start node-2 and data synchronization happens: node-1:

2017-01-01 18:07:50:045 INFO  [10.129.235.254]:2434 [orientdb] [3.7.3] Accepting socket connection from /10.129.235.254:62227 [SocketAcceptorThread]
2017-01-01 18:07:50:056 INFO  [10.129.235.254]:2434 [orientdb] [3.7.3] Established socket connection between /10.129.235.254:2434 and /10.129.235.254:62227 [TcpIpConnectionManager]
2017-01-01 18:07:57:036 INFO  [10.129.235.254]:2434 [orientdb] [3.7.3] 

Members [2] {
    Member [10.129.235.254]:2434 - f3c0d1fd-3451-4d98-a186-0db750fa1145 this
    Member [10.129.235.254]:2435 - 325e4601-3095-4798-9bc3-7b0dad6a4f73
}
 [ClusterService]
2017-01-01 18:07:57:037 WARNI [node-1] Added new node id=Member [10.129.235.254]:2435 - 325e4601-3095-4798-9bc3-7b0dad6a4f73 name=ext:325e4601-3095-4798-9bc3-7b0dad6a4f73 [OHazelcastPlugin]
2017-01-01 18:07:57:302 INFO  [10.129.235.254]:2434 [orientdb] [3.7.3] Re-partitioning cluster data... Migration queue size: 271 [MigrationManager]
2017-01-01 18:07:57:511 INFO  [10.129.235.254]:2434 [orientdb] [3.7.3] Remaining migration tasks in queue => 236 [InternalPartitionService]Finish processing task[3]
2017-01-01 18:07:58:857 INFO  [10.129.235.254]:2434 [orientdb] [3.7.3] All migration tasks have been completed, queues are empty. [MigrationThread]
2017-01-01 18:07:59:196 INFO  [node-1]<-[ext:325e4601-3095-4798-9bc3-7b0dad6a4f73] Received updated about registered nodes [OHazelcastPlugin]
2017-01-01 18:07:59:530 INFO  [node-1]<-[node-2] Added node configuration id=Member [10.129.235.254]:2435 - 325e4601-3095-4798-9bc3-7b0dad6a4f73 name=node-2, now 2 nodes are configured [OHazelcastPlugin]
2017-01-01 18:07:59:531 INFO  [node-1] Distributed servers status:

+-------+--------+----------------------+-----+---------+-------------------+-------------------+------------------------+
|Name   |Status  |Databases             |Conns|StartedOn|Binary             |HTTP               |UsedMemory              |
+-------+--------+----------------------+-----+---------+-------------------+-------------------+------------------------+
|node-1*|ONLINE  |mobius=ONLINE (MASTER)|4    |18:05:39 |169.254.205.69:2424|169.254.205.69:2480|789.26MB/3.53GB (21.84%)|
|node-2 |STARTING|                      |0    |18:07:45 |169.254.205.69:2425|169.254.205.69:2481|323.56MB/3.53GB (8.95%) |
+-------+--------+----------------------+-----+---------+-------------------+-------------------+------------------------+
 [OHazelcastPlugin]
2017-01-01 18:07:59:830 INFO  [node-1] Setting new distributed configuration for database: OSystem (version=1)

CLUSTER CONFIGURATION (LEGEND: X = Owner, o = Copy)
+--------+-----------+----------+
|CLUSTER |writeQuorum|readQuorum|
+--------+-----------+----------+
|*       |     1     |    1     |
|internal|     1     |    1     |
+--------+-----------+----------+

 [ODistributedStorage]
2017-01-01 18:07:59:907 INFO  [node-1]->[node-2] Deploying database 'mobius' with delta of changes... [OSyncDatabaseDeltaTask]
2017-01-01 18:07:59:907 INFO  [node-1] Received new status node-2.mobius=SYNCHRONIZING [OHazelcastPlugin]
2017-01-01 18:07:59:907 INFO  [node-1]->[node-2] Creating delta backup of database 'mobius' (startLSN=LSN{segment=0, position=227}) in directory: C:\Tomcats\tomcat_2443\temp\orientdb\backup_node-2_mobius.zip... [OSyncDatabaseDeltaTask]
2017-01-01 18:07:59:976 INFO  Exporting records after LSN=LSN{segment=0, position=227}. Found 1 records [OLocalPaginatedStorage]
2017-01-01 18:07:59:979 INFO  [node-1]->[node-2] Delta backup of database 'mobius' completed. range=LSN{segment=0, position=227}-LSN{segment=0, position=1102} [OSyncDatabaseDeltaTask]
2017-01-01 18:07:59:980 INFO  [node-1]->[node-2] Deploy delta database task completed [OSyncDatabaseDeltaTask]
2017-01-01 18:07:59:983 INFO  [node-1]->[node-2] - transferring chunk #1 offset=0 size=430... [OSyncDatabaseDeltaTask]
2017-01-01 18:08:00:718 INFO  [node-1]<-[node-2] Received updated status node-2.mobius=ONLINE [OHazelcastPlugin]
2017-01-01 18:08:06:022 INFO  [node-1] Distributed servers status:

+-------+------+----------------------+-----+---------+-------------------+-------------------+------------------------+
|Name   |Status|Databases             |Conns|StartedOn|Binary             |HTTP               |UsedMemory              |
+-------+------+----------------------+-----+---------+-------------------+-------------------+------------------------+
|node-1*|ONLINE|mobius=ONLINE (MASTER)|6    |18:05:39 |169.254.205.69:2424|169.254.205.69:2480|839.83MB/3.53GB (23.23%)|
|node-2 |ONLINE|mobius=ONLINE (MASTER)|2    |18:07:45 |169.254.205.69:2425|169.254.205.69:2481|502.55MB/3.53GB (13.90%)|
+-------+------+----------------------+-----+---------+-------------------+-------------------+------------------------+

node-2:

2017-01-01 18:07:44:973 INFO  Loading configuration from input stream [OServerConfigurationLoaderXml]
2017-01-01 18:07:45:452 INFO  OrientDB Server v2.2.14 is starting up... [OServer]
2017-01-01 18:07:45:464 INFO  Databases directory: C:\Users\node-2\asg\mobius\data\orientdb\databases [OServer]
2017-01-01 18:07:45:532 INFO  OrientDB auto-config DISKCACHE=3,614MB (heap=3,614MB direct=3,614MB os=16,260MB), assuming maximum direct memory size equals to maximum JVM heap size [orientechnologies]
2017-01-01 18:07:45:533 WARNI MaxDirectMemorySize JVM option is not set or has invalid value, that may cause out of memory errors. Please set the -XX:MaxDirectMemorySize=16260m option when you start the JVM. [Class]
2017-01-01 18:07:45:682 WARNI Port 0.0.0.0:2424 busy, trying the next available... [OServerNetworkListener]
2017-01-01 18:07:45:682 INFO  Listening binary connections on 0.0.0.0:2425 (protocol v.36, socket=default) [OServerNetworkListener]
2017-01-01 18:07:45:688 WARNI Port 0.0.0.0:2480 busy, trying the next available... [OServerNetworkListener]
2017-01-01 18:07:45:689 INFO  Listening http connections on 0.0.0.0:2481 (protocol v.10, socket=default) [OServerNetworkListener]
2017-01-01 18:07:45:699 INFO  Installing dynamic plugin 'studio-2.2.zip'... [OServerPluginManager]
2017-01-01 18:07:45:707 INFO  ODefaultPasswordAuthenticator is active [ODefaultPasswordAuthenticator]
2017-01-01 18:07:45:709 INFO  OServerConfigAuthenticator is active [OServerConfigAuthenticator]
2017-01-01 18:07:45:711 INFO  OSystemUserAuthenticator is active [OSystemUserAuthenticator]
2017-01-01 18:07:45:722 WARNI GREMLIN language not available (not in classpath) [OGremlinHelper]
2017-01-01 18:07:47:005 WARNI Assigning distributed node name: node-2 [OHazelcastPlugin]
2017-01-01 18:07:47:052 INFO  Configuring Hazelcast from 'C:\Users\node-2\asg\mobius\config\orientdb\hazelcast.xml'. [FileSystemXmlConfig]
2017-01-01 18:07:47:300 WARNI Name of the hazelcast schema location incorrect using default [AbstractXmlConfigHelper]
2017-01-01 18:07:47:733 INFO  [LOCAL] [orientdb] [3.7.3] Prefer IPv4 stack is true. [DefaultAddressPicker]
2017-01-01 18:07:47:823 INFO  [LOCAL] [orientdb] [3.7.3] Picked [10.129.235.254]:2435, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=2435], bind any local is true [DefaultAddressPicker]
2017-01-01 18:07:47:852 INFO  [10.129.235.254]:2435 [orientdb] [3.7.3] Hazelcast 3.7.3 (20161117 - 8166eea) starting at [10.129.235.254]:2435 [system]
2017-01-01 18:07:47:852 INFO  [10.129.235.254]:2435 [orientdb] [3.7.3] Copyright (c) 2008-2016, Hazelcast, Inc. All Rights Reserved. [system]
2017-01-01 18:07:47:852 INFO  [10.129.235.254]:2435 [orientdb] [3.7.3] Configured Hazelcast Serialization version : 1 [system]
2017-01-01 18:07:48:259 INFO  [10.129.235.254]:2435 [orientdb] [3.7.3] Backpressure is disabled [BackpressureRegulator]
2017-01-01 18:07:49:630 INFO  [10.129.235.254]:2435 [orientdb] [3.7.3] Creating MulticastJoiner [Node]
2017-01-01 18:07:49:637 INFO  [10.129.235.254]:2435 [orientdb] [3.7.3] [10.129.235.254]:2435 is STARTING [LifecycleService]
2017-01-01 18:07:49:845 INFO  [10.129.235.254]:2435 [orientdb] [3.7.3] Starting 1 partition threads [OperationExecutorImpl]
2017-01-01 18:07:49:846 INFO  [10.129.235.254]:2435 [orientdb] [3.7.3] Starting 2 generic threads (1 dedicated for priority tasks) [OperationExecutorImpl]
2017-01-01 18:07:49:850 INFO  [10.129.235.254]:2435 [orientdb] [3.7.3] TcpIpConnectionManager configured with Non Blocking IO-threading model: 3 input threads and 3 output threads [NonBlockingIOThreadingModel]
2017-01-01 18:07:50:016 INFO  [10.129.235.254]:2435 [orientdb] [3.7.3] Trying to join to discovered node: [10.129.235.254]:2434 [MulticastJoiner]
2017-01-01 18:07:50:027 INFO  [10.129.235.254]:2435 [orientdb] [3.7.3] Connecting to /10.129.235.254:2434, timeout: 0, bind-any: true [InitConnectionTask]
2017-01-01 18:07:50:049 INFO  [10.129.235.254]:2435 [orientdb] [3.7.3] Established socket connection between /10.129.235.254:62227 and /10.129.235.254:2434 [TcpIpConnectionManager]
2017-01-01 18:07:57:063 INFO  [10.129.235.254]:2435 [orientdb] [3.7.3] 

Members [2] {
    Member [10.129.235.254]:2434 - f3c0d1fd-3451-4d98-a186-0db750fa1145
    Member [10.129.235.254]:2435 - 325e4601-3095-4798-9bc3-7b0dad6a4f73 this
}
 [ClusterService]
2017-01-01 18:07:59:098 INFO  [10.129.235.254]:2435 [orientdb] [3.7.3] [10.129.235.254]:2435 is STARTED [LifecycleService]
2017-01-01 18:07:59:098 INFO  Starting distributed server 'node-2' (hzID=325e4601-3095-4798-9bc3-7b0dad6a4f73)... [OHazelcastPlugin]
2017-01-01 18:07:59:098 INFO  Distributed cluster time=Sun Jan 01 18:07:59 EET 2017 (delta from local node=0)... [OHazelcastPlugin]
2017-01-01 18:07:59:206 INFO  [node-2] Loaded configuration for database 'OSystem' from disk: C:\Users\node-2\asg\mobius\config\orientdb\default-distributed-db-config.json [ODistributedStorage]
2017-01-01 18:07:59:215 WARNI [node-2] Servers in cluster: [node-1, node-2] [OHazelcastPlugin]
2017-01-01 18:07:59:791 INFO  [node-2] Loaded configuration for database 'OSystem' from disk: C:\Users\node-2\asg\mobius\data\orientdb\databases\OSystem\distributed-config.json [ODistributedStorage]
2017-01-01 18:07:59:824 INFO  [node-2] Distributed servers status:

+-------+--------+---------+-----+---------+-------------------+-------------------+------------------------+
|Name   |Status  |Databases|Conns|StartedOn|Binary             |HTTP               |UsedMemory              |
+-------+--------+---------+-----+---------+-------------------+-------------------+------------------------+
|node-1 |ONLINE  |         |4    |18:05:39 |169.254.205.69:2424|169.254.205.69:2480|789.26MB/3.53GB (21.84%)|
|node-2*|STARTING|         |1    |18:07:45 |169.254.205.69:2425|169.254.205.69:2481|329.45MB/3.53GB (9.11%) |
+-------+--------+---------+-----+---------+-------------------+-------------------+------------------------+
 [OHazelcastPlugin]
2017-01-01 18:07:59:852 INFO  [node-2] Downloaded configuration for database 'mobius' from the cluster [ODistributedStorage]
2017-01-01 18:07:59:864 INFO  [node-2] Setting new distributed configuration for database: mobius (version=1)

CLUSTER CONFIGURATION (LEGEND: X = Owner, o = Copy)
...
2017-01-01 18:07:59:893 INFO  [node-2] Current node is a MASTER for database 'mobius' [ODistributedAbstractPlugin$3]
2017-01-01 18:07:59:898 WARNI [node-2]->[[node-1]] requesting delta database sync for 'mobius' on local server... [OHazelcastPlugin]
2017-01-01 18:07:59:901 INFO  [node-2]->[node-1] Requesting database delta sync for 'mobius' LSN=LSN{segment=0, position=227}... [OHazelcastPlugin]
2017-01-01 18:07:59:987 INFO  [node-2]<-[node-1] Copying remote database 'mobius' to: C:/Tomcats/tomcat_7443/temp/orientdb/install_mobius.zip [OHazelcastPlugin]
2017-01-01 18:07:59:992 INFO  [node-2]<-[node-1] Installing database 'mobius' to: C:/Users/node-2/asg/mobius/data/orientdb/databases/mobius... [OHazelcastPlugin]
2017-01-01 18:07:59:993 INFO  [node-2] - writing chunk #1 offset=0 size=430b [OHazelcastPlugin]
2017-01-01 18:07:59:994 INFO  [node-2] Database copied correctly, size=430b [ODistributedAbstractPlugin$4]
2017-01-01 18:08:00:388 INFO  [node-2]<-[node-1] Started import of delta for database 'mobius' [OIncrementalServerSync$1]
2017-01-01 18:08:00:438 INFO  [node-2]<-[node-1] Installed database delta for 'mobius'. 1 total entries: 0 created, 1 updated, 0 deleted, 0 holes, 0 skipped [OIncrementalServerSync$1]
2017-01-01 18:08:00:458 INFO  [node-2] Installed database 'mobius' (LSN=LSN{segment=0, position=2485}) [OHazelcastPlugin]
2017-01-01 18:08:00:698 INFO  [node-2] Reassigning ownership of clusters for database mobius... [OHazelcastPlugin]
2017-01-01 18:08:00:708 INFO  [node-2] Reassignment of clusters for database 'mobius' completed (classes=21) [OHazelcastPlugin]
2017-01-01 18:08:00:708 INFO  [node-2] Publishing ONLINE status for database node-2.mobius... [ODistributedDatabaseImpl]
2017-01-01 18:08:00:718 INFO  [node-2]<-[node-1] Installed delta of database 'mobius'... [OHazelcastPlugin]
2017-01-01 18:08:00:738 WARNI [node-2] Updated node status to 'ONLINE' [OHazelcastPlugin]
2017-01-01 18:08:00:738 INFO  [node-2] Distributed servers status:

+-------+--------+----------------------+-----+---------+-------------------+-------------------+------------------------+
|Name   |Status  |Databases             |Conns|StartedOn|Binary             |HTTP               |UsedMemory              |
+-------+--------+----------------------+-----+---------+-------------------+-------------------+------------------------+
|node-1 |ONLINE  |mobius=ONLINE (MASTER)|4    |18:05:39 |169.254.205.69:2424|169.254.205.69:2480|789.26MB/3.53GB (21.84%)|
|node-2*|STARTING|                      |1    |18:07:45 |169.254.205.69:2425|169.254.205.69:2481|329.45MB/3.53GB (9.11%) |
+-------+--------+----------------------+-----+---------+-------------------+-------------------+------------------------+
 [OHazelcastPlugin]
2017-01-01 18:08:00:738 WARNI Authenticated clients can execute any kind of code into the server by using the following allowed languages: [sql] [OServerSideScriptInterpreter]
2017-01-01 18:08:00:868 INFO  OrientDB Studio available at http://169.254.205.69:2481/studio/index.html [OServer]
2017-01-01 18:08:00:868 INFO  OrientDB Server is active v2.2.14. [OServer]
2017-01-01 18:08:06:022 INFO  [node-2] Distributed servers status:

+-------+------+----------------------+-----+---------+-------------------+-------------------+------------------------+
|Name   |Status|Databases             |Conns|StartedOn|Binary             |HTTP               |UsedMemory              |
+-------+------+----------------------+-----+---------+-------------------+-------------------+------------------------+
|node-1 |ONLINE|mobius=ONLINE (MASTER)|6    |18:05:39 |169.254.205.69:2424|169.254.205.69:2480|839.83MB/3.53GB (23.23%)|
|node-2*|ONLINE|mobius=ONLINE (MASTER)|2    |18:07:45 |169.254.205.69:2425|169.254.205.69:2481|502.55MB/3.53GB (13.90%)|
+-------+------+----------------------+-----+---------+-------------------+-------------------+------------------------+
lvca commented 7 years ago

@pavlop-asg thanks for the super detailed sequence. Just to be sure on step (5) you wrote Write value "2" -> it's saved only on node-1 since node-2 is down distributed-sync.json files: node-1: hasn't changed node-2:, but you wanted to say Write value "2" -> it's saved only on node-2 since node-1 is down distributed-sync.json files: node-1: hasn't changed node-2: right?

lvca commented 7 years ago

Reproduced the problem and fixed in v.2.2.x. Thanks @pavlop-asg!

pavlop-asg commented 7 years ago

About step 5, you are right, sorry, was a typo.

pavlop-asg commented 7 years ago

Is it already fixed in a master?

lvca commented 7 years ago

we use master only when we release, but you can use 2.2.x. If you can confirm it, we could release a 2.2.15 soon.

pavlop-asg commented 7 years ago

Thank you

pavlop-asg commented 7 years ago

I have built 2.2.x branch and tested, looks like the issue remains.

pavlop-asg commented 7 years ago

This is the log from node-2 which was started later, but had never version of some data (there are some warning):

2017-01-05 18:09:30:409 INFO  [node-2] Current node is a MASTER for database 'mobius' [ODistributedAbstractPlugin$3]
2017-01-05 18:09:30:413 WARNI [node-2]->[[node-1]] requesting delta database sync for 'mobius' on local server... [OHazelcastPlugin]
2017-01-05 18:09:30:430 INFO  [node-2]->[node-1] Requesting database delta sync for 'mobius' LSN=LSN{segment=0, position=1760037}... [OHazelcastPlugin]
2017-01-05 18:09:30:439 WARNI [node-2]<-[node-1] Error on installing database delta for 'mobius' (err=Requested delta sync with LSN=LSN{segment=0, position=1760037} but found the following error: Cannot execute delta-sync because a new file has been added. Filename: search.owner.irs(id=319368562630197421)
    DB name="mobius") [OHazelcastPlugin]
2017-01-05 18:09:30:439 WARNI [node-2]<-[node-1] Requesting full database 'mobius' sync... [OHazelcastPlugin]
2017-01-05 18:09:30:440 SEVER [node-2]->[node-1] Error on asking delta backup of database 'mobius' (err=Requested delta sync with LSN=LSN{segment=0, position=1760037} but found the following error: Cannot execute delta-sync because a new file has been added. Filename: search.owner.irs(id=319368562630197421)
    DB name="mobius") [OHazelcastPlugin]
2017-01-05 18:09:30:441 WARNI [node-2]->[[node-1]] Requesting deploy of database 'mobius' on local server... [OHazelcastPlugin]
2017-01-05 18:09:30:442 INFO  [node-2] Downloaded configuration for database 'mobius' from the cluster [ODistributedStorage]
2017-01-05 18:09:30:446 INFO  [node-2] Setting new distributed configuration for database: mobius (version=1)
...
2017-01-05 18:09:32:582 WARNI [node-2] Moving existent database 'mobius' in 'C:/Users/node-2/asg/mobius/data/orientdb/databases/mobius' to 'C:/Users/node-2/asg/mobius/data/orientdb/databases//../backup/databases/mobius' and get a fresh copy from a remote node... [OHazelcastPlugin]
2017-01-05 18:09:32:583 SEVER [node-2] Error on moving existent database 'mobius' located in 'C:/Users/node-2/asg/mobius/data/orientdb/databases/mobius' to 'C:\Users\node-2\asg\mobius\data\orientdb\databases\..\backup\databases\mobius'. Deleting old database... [OHazelcastPlugin]
2017-01-05 18:09:32:850 INFO  [node-2]<-[node-1] Copying remote database 'mobius' to: C:/Tomcats/tomcat_7443/temp/orientdb/install_mobius.zip [OHazelcastPlugin]
2017-01-05 18:09:32:859 INFO  [node-2]<-[node-1] Installing database 'mobius' to: C:/Users/node-2/asg/mobius/data/orientdb/databases/mobius... [OHazelcastPlugin]
2017-01-05 18:09:32:864 INFO  [node-2] - writing chunk #1 offset=0 size=383.51KB [OHazelcastPlugin]
2017-01-05 18:09:32:866 INFO  [node-2] Database copied correctly, size=383.51KB [ODistributedAbstractPlugin$4]
2017-01-05 18:09:35:202 WARNI {db=mobius} Storage 'mobius' was not closed properly. Will try to recover from write ahead log [OLocalPaginatedStorage]
2017-01-05 18:09:35:202 SEVER {db=mobius} Restore is not possible because write ahead log is empty. [OLocalPaginatedStorage]
2017-01-05 18:09:35:202 INFO  {db=mobius} Storage data recover was completed [OLocalPaginatedStorage]
2017-01-05 18:09:36:073 INFO  {db=mobius} [node-2] Downloaded configuration for database 'mobius' from the cluster [ODistributedStorage]
2017-01-05 18:09:36:076 INFO  {db=mobius} [node-2] Setting new distributed configuration for database: mobius (version=1)
lvca commented 7 years ago

Do you have the full log? What's happened next? Was it able to download the database?

pavlop-asg commented 7 years ago

One more, with version 2.2.15-SHAPHOT sometimes I get the following error when I try to work in a cluster:

Caused by: com.orientechnologies.orient.core.exception.ODatabaseException: Database 'plocal:C:/Users/ppoh/asg/mobius/data/orientdb/databases/mobius' is closed
    DB name="mobius"
    DB name="mobius"
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_60]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_60]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_60]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422) ~[?:1.8.0_60]
    at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.throwSerializedException(OChannelBinaryAsynchClient.java:442) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.handleStatus(OChannelBinaryAsynchClient.java:393) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:275) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:167) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.remote.OStorageRemote.beginResponse(OStorageRemote.java:2170) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.remote.OStorageRemote$27.execute(OStorageRemote.java:1162) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.remote.OStorageRemote$2.execute(OStorageRemote.java:196) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.remote.OStorageRemote.baseNetworkOperation(OStorageRemote.java:237) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperationRetry(OStorageRemote.java:193) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperation(OStorageRemote.java:204) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.remote.OStorageRemote.command(OStorageRemote.java:1138) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.core.sql.query.OSQLQuery.run(OSQLQuery.java:78) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.core.sql.query.OSQLAsynchQuery.run(OSQLAsynchQuery.java:74) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.core.sql.query.OSQLSynchQuery.run(OSQLSynchQuery.java:85) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.core.query.OQueryAbstract.execute(OQueryAbstract.java:33) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.query(ODatabaseDocumentTx.java:754) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.object.db.ODatabasePojoAbstract.query(ODatabasePojoAbstract.java:244) ~[orientdb-object-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.lambda$load$2(OrientDbConfigurationStorage.java:96) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
    at com.asg.mobius.orientdb.OrientDbObjectStorageBase.executeAndReturn(OrientDbObjectStorageBase.java:88) ~[orientdb-support-7.0.0-SNAPSHOT.jar:?]
    at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.load(OrientDbConfigurationStorage.java:95) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
    at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.lambda$loadAll$3(OrientDbConfigurationStorage.java:111) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
    at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_60]
    at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.loadAll(OrientDbConfigurationStorage.java:110) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
    at com.hazelcast.map.impl.MapStoreWrapper.loadAll(MapStoreWrapper.java:143) ~[hazelcast-3.7.3.jar:3.7.3]
    at com.hazelcast.map.impl.mapstore.AbstractMapDataStore.loadAll(AbstractMapDataStore.java:55) ~[hazelcast-3.7.3.jar:3.7.3]
    at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.loadAndGet(BasicRecordStoreLoader.java:161) ~[hazelcast-3.7.3.jar:3.7.3]
    at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.doBatchLoad(BasicRecordStoreLoader.java:134) ~[hazelcast-3.7.3.jar:3.7.3]
    at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.loadValuesInternal(BasicRecordStoreLoader.java:120) ~[hazelcast-3.7.3.jar:3.7.3]
    at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.access$100(BasicRecordStoreLoader.java:54) ~[hazelcast-3.7.3.jar:3.7.3]
    at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader$GivenKeysLoaderTask.call(BasicRecordStoreLoader.java:107) ~[hazelcast-3.7.3.jar:3.7.3]
    at com.hazelcast.util.executor.CompletableFutureTask.run(CompletableFutureTask.java:67) ~[hazelcast-3.7.3.jar:3.7.3]
    ... 6 more
2017-01-05 18:32:52.024 [ERROR]    [hz._hzInstance_2_dev.partition-operation.thread-2] com.hazelcast.map.impl.operation.GetOperation:log - [10.129.235.254]:5701 [dev] [3.7.3] Database 'plocal:C:/Users/ppoh/asg/mobius/data/orientdb/databases/mobius' is closed
    DB name="mobius"
    DB name="mobius"
com.orientechnologies.orient.core.exception.ODatabaseException: Database 'plocal:C:/Users/ppoh/asg/mobius/data/orientdb/databases/mobius' is closed
    DB name="mobius"
    DB name="mobius"
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_60]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_60]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_60]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422) ~[?:1.8.0_60]
    at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.throwSerializedException(OChannelBinaryAsynchClient.java:442) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.handleStatus(OChannelBinaryAsynchClient.java:393) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:275) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:167) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.remote.OStorageRemote.beginResponse(OStorageRemote.java:2170) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.remote.OStorageRemote$27.execute(OStorageRemote.java:1162) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.remote.OStorageRemote$2.execute(OStorageRemote.java:196) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.remote.OStorageRemote.baseNetworkOperation(OStorageRemote.java:237) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperationRetry(OStorageRemote.java:193) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperation(OStorageRemote.java:204) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.client.remote.OStorageRemote.command(OStorageRemote.java:1138) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.core.sql.query.OSQLQuery.run(OSQLQuery.java:78) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.core.sql.query.OSQLAsynchQuery.run(OSQLAsynchQuery.java:74) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.core.sql.query.OSQLSynchQuery.run(OSQLSynchQuery.java:85) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.core.query.OQueryAbstract.execute(OQueryAbstract.java:33) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.query(ODatabaseDocumentTx.java:754) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.orientechnologies.orient.object.db.ODatabasePojoAbstract.query(ODatabasePojoAbstract.java:244) ~[orientdb-object-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
    at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.lambda$load$2(OrientDbConfigurationStorage.java:96) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
    at com.asg.mobius.orientdb.OrientDbObjectStorageBase.executeAndReturn(OrientDbObjectStorageBase.java:88) ~[orientdb-support-7.0.0-SNAPSHOT.jar:?]
    at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.load(OrientDbConfigurationStorage.java:95) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
    at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.lambda$loadAll$3(OrientDbConfigurationStorage.java:111) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
    at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_60]
    at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.loadAll(OrientDbConfigurationStorage.java:110) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
    at com.hazelcast.map.impl.MapStoreWrapper.loadAll(MapStoreWrapper.java:143) ~[hazelcast-3.7.3.jar:3.7.3]
    at com.hazelcast.map.impl.mapstore.AbstractMapDataStore.loadAll(AbstractMapDataStore.java:55) ~[hazelcast-3.7.3.jar:3.7.3]
    at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.loadAndGet(BasicRecordStoreLoader.java:161) ~[hazelcast-3.7.3.jar:3.7.3]
    at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.doBatchLoad(BasicRecordStoreLoader.java:134) ~[hazelcast-3.7.3.jar:3.7.3]
    at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.loadValuesInternal(BasicRecordStoreLoader.java:120) ~[hazelcast-3.7.3.jar:3.7.3]
    at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.access$100(BasicRecordStoreLoader.java:54) ~[hazelcast-3.7.3.jar:3.7.3]
    at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader$GivenKeysLoaderTask.call(BasicRecordStoreLoader.java:107) ~[hazelcast-3.7.3.jar:3.7.3]
    at com.hazelcast.util.executor.CompletableFutureTask.run(CompletableFutureTask.java:67) ~[hazelcast-3.7.3.jar:3.7.3]
    at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:212) ~[hazelcast-3.7.3.jar:3.7.3]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_60]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_60]
    at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_60]
    at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76) ~[hazelcast-3.7.3.jar:3.7.3]
    at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:92) ~[hazelcast-3.7.3.jar:3.7.3]
pavlop-asg commented 7 years ago

Sorry, do not have the full log right now, I was trying to repeat the test and faced the error above.

pavlop-asg commented 7 years ago

The reason of that exception is the following: After node-2 joined the cluster this happened:

2017-01-05 18:28:58:293 INFO  [10.129.235.254]:2434 [orientdb] [3.7.3] Accepting socket connection from /10.129.235.254:61063 [SocketAcceptorThread]
2017-01-05 18:28:58:306 INFO  [10.129.235.254]:2434 [orientdb] [3.7.3] Established socket connection between /10.129.235.254:2434 and /10.129.235.254:61063 [TcpIpConnectionManager]
2017-01-05 18:29:05:285 INFO  [10.129.235.254]:2434 [orientdb] [3.7.3] 

Members [2] {
    Member [10.129.235.254]:2434 - be3473c4-1407-48c7-a772-5ec20ff8946a this
    Member [10.129.235.254]:2435 - 9f08c22b-351d-4884-a616-ed61a69f0b8e
}
 [ClusterService]
2017-01-05 18:29:05:285 WARNI [node-1] Added new node id=Member [10.129.235.254]:2435 - 9f08c22b-351d-4884-a616-ed61a69f0b8e name=ext:9f08c22b-351d-4884-a616-ed61a69f0b8e [OHazelcastPlugin]
2017-01-05 18:29:05:603 INFO  [10.129.235.254]:2434 [orientdb] [3.7.3] Re-partitioning cluster data... Migration queue size: 271 [MigrationManager]
2017-01-05 18:29:07:311 INFO  [10.129.235.254]:2434 [orientdb] [3.7.3] All migration tasks have been completed, queues are empty. [MigrationThread]
2017-01-05 18:29:07:458 INFO  [node-1]<-[ext:9f08c22b-351d-4884-a616-ed61a69f0b8e] Received updated about registered nodes [OHazelcastPlugin]
2017-01-05 18:29:07:845 INFO  [node-1]<-[node-2] Added node configuration id=Member [10.129.235.254]:2435 - 9f08c22b-351d-4884-a616-ed61a69f0b8e name=node-2, now 2 nodes are configured [OHazelcastPlugin]
2017-01-05 18:29:07:846 INFO  [node-1] Distributed servers status:

+-------+--------+----------------------+-----+---------+----------------+----------------+------------------------+
|Name   |Status  |Databases             |Conns|StartedOn|Binary          |HTTP            |UsedMemory              |
+-------+--------+----------------------+-----+---------+----------------+----------------+------------------------+
|node-1*|ONLINE  |mobius=ONLINE (MASTER)|4    |18:27:27 |10.33.77.53:2424|10.33.77.53:2480|768.49MB/3.53GB (21.26%)|
|node-2 |STARTING|                      |0    |18:28:54 |10.33.77.53:2425|10.33.77.53:2481|333.52MB/3.53GB (9.23%) |
+-------+--------+----------------------+-----+---------+----------------+----------------+------------------------+
 [OHazelcastPlugin]
2017-01-05 18:29:08:181 INFO  [node-1] Setting new distributed configuration for database: OSystem (version=1)

CLUSTER CONFIGURATION (LEGEND: X = Owner, o = Copy)
+--------+-----------+----------+
|CLUSTER |writeQuorum|readQuorum|
+--------+-----------+----------+
|*       |     1     |    1     |
|internal|     1     |    1     |
+--------+-----------+----------+

 [ODistributedStorage]
2017-01-05 18:29:08:229 SEVER [node-1]--[node-2] Skip deploying delta database 'mobius' because the requesting server has a most recent database (requester LastOperationOn=2017-01-05 18:09:37.209 current LastOperationOn=2017-01-05 18:09:32.573) [OSyncDatabaseDeltaTask]
2017-01-05 18:29:08:238 INFO  [node-1]<-[node-2] Received new status node-2.mobius=ONLINE [OHazelcastPlugin]
2017-01-05 18:29:08:240 INFO  [node-1]<-[node-2] Received updated status node-1.mobius=NOT_AVAILABLE [OHazelcastPlugin]
2017-01-05 18:29:08:241 INFO  [node-1] Distributed servers status:

+-------+--------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|Name   |Status  |Databases                    |Conns|StartedOn|Binary          |HTTP            |UsedMemory              |
+-------+--------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|node-1*|ONLINE  |mobius=NOT_AVAILABLE (MASTER)|4    |18:27:27 |10.33.77.53:2424|10.33.77.53:2480|768.49MB/3.53GB (21.26%)|
|node-2 |STARTING|                             |0    |18:28:54 |10.33.77.53:2425|10.33.77.53:2481|340.27MB/3.53GB (9.41%) |
+-------+--------+-----------------------------+-----+---------+----------------+----------------+------------------------+
 [OHazelcastPlugin]
2017-01-05 18:29:13:609 INFO  [node-1] Distributed servers status:

+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|Name   |Status|Databases                    |Conns|StartedOn|Binary          |HTTP            |UsedMemory              |
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|node-1*|ONLINE|mobius=NOT_AVAILABLE (MASTER)|6    |18:27:27 |10.33.77.53:2424|10.33.77.53:2480|818.16MB/3.53GB (22.64%)|
|node-2 |ONLINE|mobius=ONLINE (MASTER)       |2    |18:28:54 |10.33.77.53:2425|10.33.77.53:2481|469.05MB/3.53GB (12.98%)|
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
 [OHazelcastPlugin]
2017-01-05 18:29:13:886 INFO  [node-1] Trying to recover current server for database 'mobius'... [OClusterHealthChecker]
2017-01-05 18:29:13:889 INFO  [node-1] Current node is a MASTER for database 'mobius' [ODistributedAbstractPlugin$3]
2017-01-05 18:29:13:890 WARNI [node-1]->[[node-2]] requesting delta database sync for 'mobius' on local server... [OHazelcastPlugin]
2017-01-05 18:29:13:891 INFO  [node-1]->[node-2] Requesting database delta sync for 'mobius' LSN=LSN{segment=0, position=168631}... [OHazelcastPlugin]
2017-01-05 18:29:14:302 INFO  [node-1]<-[node-2] Received updated status node-1.mobius=SYNCHRONIZING [OHazelcastPlugin]
2017-01-05 18:29:14:303 INFO  [node-1] Distributed servers status:

+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|Name   |Status|Databases                    |Conns|StartedOn|Binary          |HTTP            |UsedMemory              |
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|node-1*|ONLINE|mobius=SYNCHRONIZING (MASTER)|6    |18:27:27 |10.33.77.53:2424|10.33.77.53:2480|818.16MB/3.53GB (22.64%)|
|node-2 |ONLINE|mobius=ONLINE (MASTER)       |2    |18:28:54 |10.33.77.53:2425|10.33.77.53:2481|469.05MB/3.53GB (12.98%)|
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
 [OHazelcastPlugin]
2017-01-05 18:29:14:314 WARNI [node-1]<-[node-2] Error on installing database delta for 'mobius' (err=Requested database delta sync with LSN=LSN{segment=0, position=168631} but not found in database) [OHazelcastPlugin]
2017-01-05 18:29:14:314 WARNI [node-1]<-[node-2] Requesting full database 'mobius' sync... [OHazelcastPlugin]
2017-01-05 18:29:14:315 SEVER [node-1]->[node-2] Error on asking delta backup of database 'mobius' (err=Requested database delta sync with LSN=LSN{segment=0, position=168631} but not found in database) [OHazelcastPlugin]
2017-01-05 18:29:14:509 WARNI [node-1]->[[node-2]] Requesting deploy of database 'mobius' on local server... [OHazelcastPlugin]
2017-01-05 18:29:14:517 INFO  [node-1] Downloaded configuration for database 'mobius' from the cluster [ODistributedStorage]
2017-01-05 18:29:14:518 INFO  [node-1] Setting new distributed configuration for database: mobius (version=1)

CLUSTER CONFIGURATION (LEGEND: X = Owner, o = Copy)
+--------+-----------+----------+-------------+-------+
|        |           |          |   MASTER    |MASTER |
|        |           |          |SYNCHRONIZING|ONLINE |
|        |           |          |   dynamic   |dynamic|
+--------+-----------+----------+-------------+-------+
|CLUSTER |writeQuorum|readQuorum|   node-1    |node-2 |
+--------+-----------+----------+-------------+-------+
|*       |     1     |    1     |      X      |   o   |
|internal|     1     |    1     |             |       |
+--------+-----------+----------+-------------+-------+

 [ODistributedStorage]
2017-01-05 18:29:14:537 INFO  [node-1]<-[node-2] Received updated status node-2.mobius=BACKUP [OHazelcastPlugin]
2017-01-05 18:29:14:539 INFO  [node-1] Distributed servers status:

+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|Name   |Status|Databases                    |Conns|StartedOn|Binary          |HTTP            |UsedMemory              |
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|node-1*|ONLINE|mobius=SYNCHRONIZING (MASTER)|6    |18:27:27 |10.33.77.53:2424|10.33.77.53:2480|818.16MB/3.53GB (22.64%)|
|node-2 |ONLINE|mobius=BACKUP (MASTER)       |2    |18:28:54 |10.33.77.53:2425|10.33.77.53:2481|469.05MB/3.53GB (12.98%)|
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
 [OHazelcastPlugin]$ANSI{green {db=mobius}} Error during storage close
com.orientechnologies.orient.core.exception.OStorageException: Cannot open the storage 'mobius' because it does not exist in path: C:/Users/ppoh/asg/mobius/data/orientdb/databases/mobius
    DB name="mobius"
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.open(OAbstractPaginatedStorage.java:285)
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.open(ODatabaseDocumentTx.java:261)
    at com.orientechnologies.orient.server.OServer.openDatabaseBypassingSecurity(OServer.java:980)
    at com.orientechnologies.orient.server.OServer.openDatabase(OServer.java:959)
    at com.orientechnologies.orient.server.OServer.openDatabase(OServer.java:932)
    at com.orientechnologies.orient.server.OServer.openDatabase(OServer.java:923)
    at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.openDatabase(ONetworkProtocolBinary.java:847)
    at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.handshakeRequest(ONetworkProtocolBinary.java:231)
    at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.execute(ONetworkProtocolBinary.java:196)
    at com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:77)
$ANSI{green {db=mobius}} Error during storage close
pavlop-asg commented 7 years ago

Any comments on that? And could you please tell me an approximate date of 2.2.15 release?

pavlop-asg commented 7 years ago

Looks like this issue is fixed, but I found one more minor issue related to that #7073

pavlop-asg commented 7 years ago

I have tested released 2.2.15 version and the fix was removed. Was not it released because of #7073 bug?

santo-it commented 7 years ago

Hi @pavlop-asg,

Thanks for your comment. I'm sorry to hear about your problems. It seems strange to me this difference between 2.2.15-SHAPHOT and 2.2.15. As far as I see the fix has been included in 2.2.15

Are you sure you made the very same test on 2.2.15-SHAPHOT and 2.2.15? I am reopening the issue for now

Thanks,

pavlop-asg commented 7 years ago

I have repeated a test with 2.2.15 and looks like the issue is fixed only partly. Use cases:

lvca commented 7 years ago

I've just tried the 2nd case and everything works. The node1 was restarted, it recognized that its database was newer. This is the log.

2017-01-25 22:38:24:458 INFO  [node1]->[[node2]] Current copy of database 'demo' is newer then the copy present in the cluster. Use the local copy and force other nodes to download this [ODistributedAbstractPlugin$3]
2017-01-25 22:38:24:468 WARNI [node1] Updated node status to 'ONLINE' [OHazelcastPlugin]
2017-01-25 22:38:24:470 INFO  [node1] Distributed servers status:

+------+--------+---------------------------+-----+---------+---------------+---------------+----------------------+
|Name  |Status  |Databases                  |Conns|StartedOn|Binary         |HTTP           |UsedMemory            |
+------+--------+---------------------------+-----+---------+---------------+---------------+----------------------+
|node2 |ONLINE  |demo=NOT_AVAILABLE (MASTER)|1    |22:38:04 |10.1.1.100:2424|10.1.1.100:2480|35.50MB/3.56GB (0.97%)|
|node1*|STARTING|                           |3    |22:38:13 |10.1.1.100:2425|10.1.1.100:2481|35.36MB/8.00GB (0.43%)|
+------+--------+---------------------------+-----+---------+---------------+---------------+----------------------+
 [OHazelcastPlugin]
2017-01-25 22:38:24:472 WARNI Authenticated clients can execute any kind of code into the server by using the following allowed languages: [sql] [OServerSideScriptInterpreter]
2017-01-25 22:38:24:472 INFO   [OEnterpriseAgent]
2017-01-25 22:38:24:472 INFO  ***************************************************************************** [OEnterpriseAgent]
2017-01-25 22:38:24:472 INFO  *                     ORIENTDB  -  ENTERPRISE EDITION                       * [OEnterpriseAgent]
2017-01-25 22:38:24:473 INFO  ***************************************************************************** [OEnterpriseAgent]
2017-01-25 22:38:24:473 INFO  * If you are in Production or Test, you must purchase a commercial license. * [OEnterpriseAgent]
2017-01-25 22:38:24:473 INFO  * For more information look at: http://orientdb.com/orientdb-enterprise/    * [OEnterpriseAgent]
2017-01-25 22:38:24:473 INFO  ***************************************************************************** [OEnterpriseAgent]
2017-01-25 22:38:24:473 INFO   [OEnterpriseAgent]
2017-01-25 22:38:24:487 INFO  Profiler is recording metrics with configuration: 60,24 [OEnterpriseProfiler]
2017-01-25 22:38:24:677 INFO  Scheduled [FULL_BACKUP] task : fa461098-a6ea-410a-8909-6c974f12e58e. Next execution will be Sun Jan 15 08:10:00 EST 2017 [OBackupTask]
2017-01-25 22:38:24:719 INFO  OrientDB Studio available at http://10.1.1.100:2481/studio/index.html [OServer]
2017-01-25 22:38:24:719 INFO  OrientDB Server is active v2.2.16-SNAPSHOT. [OServer]
2017-01-25 22:38:28:285 INFO  [node1] Received updated status node2.demo=SYNCHRONIZING [OHazelcastPlugin]
2017-01-25 22:38:28:285 INFO  [node1]->[node2] Deploying database 'demo' with delta of changes... [OSyncDatabaseDeltaTask]
2017-01-25 22:38:28:285 INFO  [node1]->[node2] Creating delta backup of database 'demo' (startLSN=LSN{segment=0, position=484849}) in directory: /var/folders/zc/y34429014c3bt_x1587qblth0000gn/T/orientdb/backup_node2_demo.zip... [OSyncDatabaseDeltaTask]
2017-01-25 22:38:28:286 INFO  [node1] Distributed servers status:

+------+--------+---------------------------+-----+---------+---------------+---------------+----------------------+
|Name  |Status  |Databases                  |Conns|StartedOn|Binary         |HTTP           |UsedMemory            |
+------+--------+---------------------------+-----+---------+---------------+---------------+----------------------+
|node2 |ONLINE  |demo=SYNCHRONIZING (MASTER)|1    |22:38:04 |10.1.1.100:2424|10.1.1.100:2480|35.50MB/3.56GB (0.97%)|
|node1*|STARTING|                           |3    |22:38:13 |10.1.1.100:2425|10.1.1.100:2481|35.36MB/8.00GB (0.43%)|
+------+--------+---------------------------+-----+---------+---------------+---------------+----------------------+
 [OHazelcastPlugin]
2017-01-25 22:38:28:290 INFO  Exporting records after LSN=LSN{segment=0, position=484849}. Found 1 records [OEnterpriseLocalPaginatedStorage]
2017-01-25 22:38:28:291 INFO  [node1]->[node2] Delta backup of database 'demo' completed. range=LSN{segment=0, position=484849}-LSN{segment=0, position=485604} [OSyncDatabaseDeltaTask]
2017-01-25 22:38:28:291 INFO  [node1]->[node2] Deploy delta database task completed [OSyncDatabaseDeltaTask]
2017-01-25 22:38:28:293 INFO  [node1]->[node2] - transferring chunk #1 offset=0 size=44... [OSyncDatabaseDeltaTask]
2017-01-25 22:38:28:349 INFO  [node1]<-[node2] Received updated status node2.demo=ONLINE [OHazelcastPlugin]
2017-01-25 22:38:29:480 INFO  [node1] Distributed servers status:

+------+------+--------------------+-----+---------+---------------+---------------+----------------------+
|Name  |Status|Databases           |Conns|StartedOn|Binary         |HTTP           |UsedMemory            |
+------+------+--------------------+-----+---------+---------------+---------------+----------------------+
|node2 |ONLINE|demo=ONLINE (MASTER)|1    |22:38:04 |10.1.1.100:2424|10.1.1.100:2480|35.50MB/3.56GB (0.97%)|
|node1*|ONLINE|demo=ONLINE (MASTER)|3    |22:38:13 |10.1.1.100:2425|10.1.1.100:2481|67.84MB/8.00GB (0.83%)|
+------+------+--------------------+-----+---------+---------------+---------------+----------------------+
 [OHazelcastPlugin]
lvca commented 7 years ago

Actually this worked out after a fix that I just pushed on 2.2.x. Please could you test both cases? They both work for me.

pavlop-asg commented 7 years ago

Thank you, I will try.

pavlop-asg commented 7 years ago

It is working fine for me after your fix. Thank you a lot!

pavlop-asg commented 7 years ago

Could you tell how soon 2.2.16 will be released?

santo-it commented 7 years ago

Hi @pavlop-asg,

Glad to hear the issue has been fixed. Thanks for your patience and tests

I'll discuss internally and try to understand when 2.2.16 will be released

Thanks,

santo-it commented 7 years ago

Fixed by ce26174

Release notes updated

No changes in the documentation required

Can be closed