Closed pavlop-asg closed 7 years ago
I suggest you to upgrade to the last 2.2.13 where many bugs on distributed have been already fixed.
Sorry, was a typo, we use v2.2.13
We improved the merge strategy in 2.2.x branch. There are still some corner case where the merge is not done, but we will finalize it in the following days. However, if you could try 2.2.x, your feedback would be valuable before we release the next hotfix.
I have tested with the latest v2.2.13 version. May I clarify, is it an OrientDB issue or maybe I can somehow change the configuration to make it working?
Is this issue going to be resolved in the next hotfix? Just want to clarify the expectation, because we have a release very soon and it is critical for us.
This issue has been fixed in last 2.2.x. Any chance to test it?
Which version should I test, the latest snapshot?
You can it by your own or use last snapshot just published: https://oss.sonatype.org/content/repositories/snapshots/com/orientechnologies/orientdb-community/2.2.14-SNAPSHOT/orientdb-community-2.2.14-20161221.214344-76.tar.gz
I have tested 2.2.14 version and the bug is still here.
If you tried the same wit 2.2.14, could you please publish here the content of the 2 files distributed-sync.json
?
{
"@type": "d",
"@version": 0,
"version": 30,
"autoDeploy": true,
"readQuorum": 1,
"writeQuorum": 1,
"executionMode": "undefined",
"readYourWrites": true,
"newNodeStrategy": "dynamic",
"servers": {
"@type": "d",
"@version": 0,
"*": "master"
},
"clusters": {
"@type": "d",
"@version": 0,
"internal": {
"@type": "d",
"@version": 0
},
"*": {
"@type": "d",
"@version": 0,
"servers": [
"node-1",
"node-2",
"<NEW_NODE>"
]
},
"odocumentwrapper_2": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"odocumentwrapper": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"aclentity": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"aclentity_1": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"ofunction_0": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"configurationentry_3": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"configurationentry": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"v": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"v_3": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"principalentity": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"principalentity_1": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"roleentity": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"roleentity_1": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"rulemodelentity_1": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"rulemodelentity_3": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"favoriteentity_3": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"favoriteentity": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"extractfilterentity_1": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"extractfilterentity_2": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"e": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"e_1": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"osequence_0": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"orole_0": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"savedsearchentity_3": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"savedsearchentity": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"ouser_0": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
},
"oschedule_0": {
"@type": "d",
"@version": 0,
"servers": [
"node-2",
"node-1",
"<NEW_NODE>"
]
}
}
}
Not the distributed-config.json
, but the distributed-sync.json
.
{
"@type": "d",
"@version": 0,
"version": 57,
"node-1": {
"@type": "d",
"@version": 0,
"segment": 0,
"position": 8726,
"@fieldTypes": "segment=l,position=l"
},
"lastOperationTimeStamp": 1482505984716,
"node-2": {
"@type": "d",
"@version": 0,
"segment": 0,
"position": 3552,
"@fieldTypes": "segment=l,position=l"
},
"@fieldTypes": "lastOperationTimeStamp=l"
}
Could you please provide the files from other servers too, so I can compare what's wrong?
I have repeated the test and attached distributed-sync.json files and logs.
Start node-1 distributed-sync.json files: node-1:
{
"@type": "d",
"@version": 0,
"lastOperationTimeStamp": 1483285391341,
"version": 2,
"node-1": {
"@type": "d",
"@version": 0,
"segment": 0,
"position": 102,
"@fieldTypes": "segment=l,position=l"
},
"@fieldTypes": "lastOperationTimeStamp=l"
}
node-2: empty
Start node-2 distributed-sync.json files: node-1: hasn't changed node-2:
{
"@type": "d",
"@version": 0,
"lastOperationTimeStamp": 1483285599260,
"version": 8,
"node-1": {
"@type": "d",
"@version": 0,
"segment": 0,
"position": 102,
"@fieldTypes": "segment=l,position=l"
},
"node-2": {
"@type": "d",
"@version": 0,
"segment": 0,
"position": 86,
"@fieldTypes": "segment=l,position=l"
},
"@fieldTypes": "lastOperationTimeStamp=l"
}
Write value "1" -> it's replicated and accessible on both nodes distributed-sync.json files: node-1:
{
"@type": "d",
"@version": 0,
"lastOperationTimeStamp": 1483286104089,
"version": 6,
"node-1": {
"@type": "d",
"@version": 0,
"segment": 0,
"position": 959,
"@fieldTypes": "segment=l,position=l"
},
"@fieldTypes": "lastOperationTimeStamp=l"
}
node-2:
{
"@type": "d",
"@version": 0,
"lastOperationTimeStamp": 1483286104177,
"version": 11,
"node-1": {
"@type": "d",
"@version": 0,
"segment": 0,
"position": 227,
"@fieldTypes": "segment=l,position=l"
},
"node-2": {
"@type": "d",
"@version": 0,
"segment": 0,
"position": 86,
"@fieldTypes": "segment=l,position=l"
},
"@fieldTypes": "lastOperationTimeStamp=l"
}
Shutdown node-1 distributed-sync.json files: node-1:
{
"@type": "d",
"@version": 0,
"lastOperationTimeStamp": 1483286104089,
"version": 8,
"node-1": {
"@type": "d",
"@version": 0,
"segment": 0,
"position": 959,
"@fieldTypes": "segment=l,position=l"
},
"@fieldTypes": "lastOperationTimeStamp=l"
}
node-2:
{
"@type": "d",
"@version": 0,
"lastOperationTimeStamp": 1483286104177,
"version": 13,
"node-1": {
"@type": "d",
"@version": 0,
"segment": 0,
"position": 227,
"@fieldTypes": "segment=l,position=l"
},
"node-2": {
"@type": "d",
"@version": 0,
"segment": 0,
"position": 86,
"@fieldTypes": "segment=l,position=l"
},
"@fieldTypes": "lastOperationTimeStamp=l"
}
Write value "2" -> it's saved only on node-1 since node-2 is down, node-2 has value "1" in the file system distributed-sync.json files: node-1: hasn't changed node-2:
{
"@type": "d",
"@version": 0,
"lastOperationTimeStamp": 1483286551982,
"version": 15,
"node-1": {
"@type": "d",
"@version": 0,
"segment": 0,
"position": 227,
"@fieldTypes": "segment=l,position=l"
},
"node-2": {
"@type": "d",
"@version": 0,
"segment": 0,
"position": 1691,
"@fieldTypes": "segment=l,position=l"
},
"@fieldTypes": "lastOperationTimeStamp=l"
}
Shutdown node-2 distributed-sync.json files: node-1: hasn't changed node-2:
{
"@type": "d",
"@version": 0,
"lastOperationTimeStamp": 1483286551982,
"version": 17,
"node-1": {
"@type": "d",
"@version": 0,
"segment": 0,
"position": 227,
"@fieldTypes": "segment=l,position=l"
},
"node-2": {
"@type": "d",
"@version": 0,
"segment": 0,
"position": 1691,
"@fieldTypes": "segment=l,position=l"
},
"@fieldTypes": "lastOperationTimeStamp=l"
}
Start node-1 distributed-sync.json files: node-1:
{
"@type": "d",
"@version": 0,
"lastOperationTimeStamp": 1483286746497,
"version": 10,
"node-1": {
"@type": "d",
"@version": 0,
"segment": 0,
"position": 1102,
"@fieldTypes": "segment=l,position=l"
},
"@fieldTypes": "lastOperationTimeStamp=l"
}
node-2: hasn't changed
Start node-2 -> here data is lost, value "2" is getting overwritten by value "1" distributed-sync.json files: node-1: hasn't changed node-2:
{
"@type": "d",
"@version": 0,
"lastOperationTimeStamp": 1483286551982,
"version": 20,
"node-1": {
"@type": "d",
"@version": 0,
"segment": 0,
"position": 227,
"@fieldTypes": "segment=l,position=l"
},
"node-2": {
"@type": "d",
"@version": 0,
"segment": 0,
"position": 2485,
"@fieldTypes": "segment=l,position=l"
},
"@fieldTypes": "lastOperationTimeStamp=l"
}
Log files when we start node-2 and data synchronization happens: node-1:
2017-01-01 18:07:50:045 INFO [10.129.235.254]:2434 [orientdb] [3.7.3] Accepting socket connection from /10.129.235.254:62227 [SocketAcceptorThread]
2017-01-01 18:07:50:056 INFO [10.129.235.254]:2434 [orientdb] [3.7.3] Established socket connection between /10.129.235.254:2434 and /10.129.235.254:62227 [TcpIpConnectionManager]
2017-01-01 18:07:57:036 INFO [10.129.235.254]:2434 [orientdb] [3.7.3]
Members [2] {
Member [10.129.235.254]:2434 - f3c0d1fd-3451-4d98-a186-0db750fa1145 this
Member [10.129.235.254]:2435 - 325e4601-3095-4798-9bc3-7b0dad6a4f73
}
[ClusterService]
2017-01-01 18:07:57:037 WARNI [node-1] Added new node id=Member [10.129.235.254]:2435 - 325e4601-3095-4798-9bc3-7b0dad6a4f73 name=ext:325e4601-3095-4798-9bc3-7b0dad6a4f73 [OHazelcastPlugin]
2017-01-01 18:07:57:302 INFO [10.129.235.254]:2434 [orientdb] [3.7.3] Re-partitioning cluster data... Migration queue size: 271 [MigrationManager]
2017-01-01 18:07:57:511 INFO [10.129.235.254]:2434 [orientdb] [3.7.3] Remaining migration tasks in queue => 236 [InternalPartitionService]Finish processing task[3]
2017-01-01 18:07:58:857 INFO [10.129.235.254]:2434 [orientdb] [3.7.3] All migration tasks have been completed, queues are empty. [MigrationThread]
2017-01-01 18:07:59:196 INFO [node-1]<-[ext:325e4601-3095-4798-9bc3-7b0dad6a4f73] Received updated about registered nodes [OHazelcastPlugin]
2017-01-01 18:07:59:530 INFO [node-1]<-[node-2] Added node configuration id=Member [10.129.235.254]:2435 - 325e4601-3095-4798-9bc3-7b0dad6a4f73 name=node-2, now 2 nodes are configured [OHazelcastPlugin]
2017-01-01 18:07:59:531 INFO [node-1] Distributed servers status:
+-------+--------+----------------------+-----+---------+-------------------+-------------------+------------------------+
|Name |Status |Databases |Conns|StartedOn|Binary |HTTP |UsedMemory |
+-------+--------+----------------------+-----+---------+-------------------+-------------------+------------------------+
|node-1*|ONLINE |mobius=ONLINE (MASTER)|4 |18:05:39 |169.254.205.69:2424|169.254.205.69:2480|789.26MB/3.53GB (21.84%)|
|node-2 |STARTING| |0 |18:07:45 |169.254.205.69:2425|169.254.205.69:2481|323.56MB/3.53GB (8.95%) |
+-------+--------+----------------------+-----+---------+-------------------+-------------------+------------------------+
[OHazelcastPlugin]
2017-01-01 18:07:59:830 INFO [node-1] Setting new distributed configuration for database: OSystem (version=1)
CLUSTER CONFIGURATION (LEGEND: X = Owner, o = Copy)
+--------+-----------+----------+
|CLUSTER |writeQuorum|readQuorum|
+--------+-----------+----------+
|* | 1 | 1 |
|internal| 1 | 1 |
+--------+-----------+----------+
[ODistributedStorage]
2017-01-01 18:07:59:907 INFO [node-1]->[node-2] Deploying database 'mobius' with delta of changes... [OSyncDatabaseDeltaTask]
2017-01-01 18:07:59:907 INFO [node-1] Received new status node-2.mobius=SYNCHRONIZING [OHazelcastPlugin]
2017-01-01 18:07:59:907 INFO [node-1]->[node-2] Creating delta backup of database 'mobius' (startLSN=LSN{segment=0, position=227}) in directory: C:\Tomcats\tomcat_2443\temp\orientdb\backup_node-2_mobius.zip... [OSyncDatabaseDeltaTask]
2017-01-01 18:07:59:976 INFO Exporting records after LSN=LSN{segment=0, position=227}. Found 1 records [OLocalPaginatedStorage]
2017-01-01 18:07:59:979 INFO [node-1]->[node-2] Delta backup of database 'mobius' completed. range=LSN{segment=0, position=227}-LSN{segment=0, position=1102} [OSyncDatabaseDeltaTask]
2017-01-01 18:07:59:980 INFO [node-1]->[node-2] Deploy delta database task completed [OSyncDatabaseDeltaTask]
2017-01-01 18:07:59:983 INFO [node-1]->[node-2] - transferring chunk #1 offset=0 size=430... [OSyncDatabaseDeltaTask]
2017-01-01 18:08:00:718 INFO [node-1]<-[node-2] Received updated status node-2.mobius=ONLINE [OHazelcastPlugin]
2017-01-01 18:08:06:022 INFO [node-1] Distributed servers status:
+-------+------+----------------------+-----+---------+-------------------+-------------------+------------------------+
|Name |Status|Databases |Conns|StartedOn|Binary |HTTP |UsedMemory |
+-------+------+----------------------+-----+---------+-------------------+-------------------+------------------------+
|node-1*|ONLINE|mobius=ONLINE (MASTER)|6 |18:05:39 |169.254.205.69:2424|169.254.205.69:2480|839.83MB/3.53GB (23.23%)|
|node-2 |ONLINE|mobius=ONLINE (MASTER)|2 |18:07:45 |169.254.205.69:2425|169.254.205.69:2481|502.55MB/3.53GB (13.90%)|
+-------+------+----------------------+-----+---------+-------------------+-------------------+------------------------+
node-2:
2017-01-01 18:07:44:973 INFO Loading configuration from input stream [OServerConfigurationLoaderXml]
2017-01-01 18:07:45:452 INFO OrientDB Server v2.2.14 is starting up... [OServer]
2017-01-01 18:07:45:464 INFO Databases directory: C:\Users\node-2\asg\mobius\data\orientdb\databases [OServer]
2017-01-01 18:07:45:532 INFO OrientDB auto-config DISKCACHE=3,614MB (heap=3,614MB direct=3,614MB os=16,260MB), assuming maximum direct memory size equals to maximum JVM heap size [orientechnologies]
2017-01-01 18:07:45:533 WARNI MaxDirectMemorySize JVM option is not set or has invalid value, that may cause out of memory errors. Please set the -XX:MaxDirectMemorySize=16260m option when you start the JVM. [Class]
2017-01-01 18:07:45:682 WARNI Port 0.0.0.0:2424 busy, trying the next available... [OServerNetworkListener]
2017-01-01 18:07:45:682 INFO Listening binary connections on 0.0.0.0:2425 (protocol v.36, socket=default) [OServerNetworkListener]
2017-01-01 18:07:45:688 WARNI Port 0.0.0.0:2480 busy, trying the next available... [OServerNetworkListener]
2017-01-01 18:07:45:689 INFO Listening http connections on 0.0.0.0:2481 (protocol v.10, socket=default) [OServerNetworkListener]
2017-01-01 18:07:45:699 INFO Installing dynamic plugin 'studio-2.2.zip'... [OServerPluginManager]
2017-01-01 18:07:45:707 INFO ODefaultPasswordAuthenticator is active [ODefaultPasswordAuthenticator]
2017-01-01 18:07:45:709 INFO OServerConfigAuthenticator is active [OServerConfigAuthenticator]
2017-01-01 18:07:45:711 INFO OSystemUserAuthenticator is active [OSystemUserAuthenticator]
2017-01-01 18:07:45:722 WARNI GREMLIN language not available (not in classpath) [OGremlinHelper]
2017-01-01 18:07:47:005 WARNI Assigning distributed node name: node-2 [OHazelcastPlugin]
2017-01-01 18:07:47:052 INFO Configuring Hazelcast from 'C:\Users\node-2\asg\mobius\config\orientdb\hazelcast.xml'. [FileSystemXmlConfig]
2017-01-01 18:07:47:300 WARNI Name of the hazelcast schema location incorrect using default [AbstractXmlConfigHelper]
2017-01-01 18:07:47:733 INFO [LOCAL] [orientdb] [3.7.3] Prefer IPv4 stack is true. [DefaultAddressPicker]
2017-01-01 18:07:47:823 INFO [LOCAL] [orientdb] [3.7.3] Picked [10.129.235.254]:2435, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=2435], bind any local is true [DefaultAddressPicker]
2017-01-01 18:07:47:852 INFO [10.129.235.254]:2435 [orientdb] [3.7.3] Hazelcast 3.7.3 (20161117 - 8166eea) starting at [10.129.235.254]:2435 [system]
2017-01-01 18:07:47:852 INFO [10.129.235.254]:2435 [orientdb] [3.7.3] Copyright (c) 2008-2016, Hazelcast, Inc. All Rights Reserved. [system]
2017-01-01 18:07:47:852 INFO [10.129.235.254]:2435 [orientdb] [3.7.3] Configured Hazelcast Serialization version : 1 [system]
2017-01-01 18:07:48:259 INFO [10.129.235.254]:2435 [orientdb] [3.7.3] Backpressure is disabled [BackpressureRegulator]
2017-01-01 18:07:49:630 INFO [10.129.235.254]:2435 [orientdb] [3.7.3] Creating MulticastJoiner [Node]
2017-01-01 18:07:49:637 INFO [10.129.235.254]:2435 [orientdb] [3.7.3] [10.129.235.254]:2435 is STARTING [LifecycleService]
2017-01-01 18:07:49:845 INFO [10.129.235.254]:2435 [orientdb] [3.7.3] Starting 1 partition threads [OperationExecutorImpl]
2017-01-01 18:07:49:846 INFO [10.129.235.254]:2435 [orientdb] [3.7.3] Starting 2 generic threads (1 dedicated for priority tasks) [OperationExecutorImpl]
2017-01-01 18:07:49:850 INFO [10.129.235.254]:2435 [orientdb] [3.7.3] TcpIpConnectionManager configured with Non Blocking IO-threading model: 3 input threads and 3 output threads [NonBlockingIOThreadingModel]
2017-01-01 18:07:50:016 INFO [10.129.235.254]:2435 [orientdb] [3.7.3] Trying to join to discovered node: [10.129.235.254]:2434 [MulticastJoiner]
2017-01-01 18:07:50:027 INFO [10.129.235.254]:2435 [orientdb] [3.7.3] Connecting to /10.129.235.254:2434, timeout: 0, bind-any: true [InitConnectionTask]
2017-01-01 18:07:50:049 INFO [10.129.235.254]:2435 [orientdb] [3.7.3] Established socket connection between /10.129.235.254:62227 and /10.129.235.254:2434 [TcpIpConnectionManager]
2017-01-01 18:07:57:063 INFO [10.129.235.254]:2435 [orientdb] [3.7.3]
Members [2] {
Member [10.129.235.254]:2434 - f3c0d1fd-3451-4d98-a186-0db750fa1145
Member [10.129.235.254]:2435 - 325e4601-3095-4798-9bc3-7b0dad6a4f73 this
}
[ClusterService]
2017-01-01 18:07:59:098 INFO [10.129.235.254]:2435 [orientdb] [3.7.3] [10.129.235.254]:2435 is STARTED [LifecycleService]
2017-01-01 18:07:59:098 INFO Starting distributed server 'node-2' (hzID=325e4601-3095-4798-9bc3-7b0dad6a4f73)... [OHazelcastPlugin]
2017-01-01 18:07:59:098 INFO Distributed cluster time=Sun Jan 01 18:07:59 EET 2017 (delta from local node=0)... [OHazelcastPlugin]
2017-01-01 18:07:59:206 INFO [node-2] Loaded configuration for database 'OSystem' from disk: C:\Users\node-2\asg\mobius\config\orientdb\default-distributed-db-config.json [ODistributedStorage]
2017-01-01 18:07:59:215 WARNI [node-2] Servers in cluster: [node-1, node-2] [OHazelcastPlugin]
2017-01-01 18:07:59:791 INFO [node-2] Loaded configuration for database 'OSystem' from disk: C:\Users\node-2\asg\mobius\data\orientdb\databases\OSystem\distributed-config.json [ODistributedStorage]
2017-01-01 18:07:59:824 INFO [node-2] Distributed servers status:
+-------+--------+---------+-----+---------+-------------------+-------------------+------------------------+
|Name |Status |Databases|Conns|StartedOn|Binary |HTTP |UsedMemory |
+-------+--------+---------+-----+---------+-------------------+-------------------+------------------------+
|node-1 |ONLINE | |4 |18:05:39 |169.254.205.69:2424|169.254.205.69:2480|789.26MB/3.53GB (21.84%)|
|node-2*|STARTING| |1 |18:07:45 |169.254.205.69:2425|169.254.205.69:2481|329.45MB/3.53GB (9.11%) |
+-------+--------+---------+-----+---------+-------------------+-------------------+------------------------+
[OHazelcastPlugin]
2017-01-01 18:07:59:852 INFO [node-2] Downloaded configuration for database 'mobius' from the cluster [ODistributedStorage]
2017-01-01 18:07:59:864 INFO [node-2] Setting new distributed configuration for database: mobius (version=1)
CLUSTER CONFIGURATION (LEGEND: X = Owner, o = Copy)
...
2017-01-01 18:07:59:893 INFO [node-2] Current node is a MASTER for database 'mobius' [ODistributedAbstractPlugin$3]
2017-01-01 18:07:59:898 WARNI [node-2]->[[node-1]] requesting delta database sync for 'mobius' on local server... [OHazelcastPlugin]
2017-01-01 18:07:59:901 INFO [node-2]->[node-1] Requesting database delta sync for 'mobius' LSN=LSN{segment=0, position=227}... [OHazelcastPlugin]
2017-01-01 18:07:59:987 INFO [node-2]<-[node-1] Copying remote database 'mobius' to: C:/Tomcats/tomcat_7443/temp/orientdb/install_mobius.zip [OHazelcastPlugin]
2017-01-01 18:07:59:992 INFO [node-2]<-[node-1] Installing database 'mobius' to: C:/Users/node-2/asg/mobius/data/orientdb/databases/mobius... [OHazelcastPlugin]
2017-01-01 18:07:59:993 INFO [node-2] - writing chunk #1 offset=0 size=430b [OHazelcastPlugin]
2017-01-01 18:07:59:994 INFO [node-2] Database copied correctly, size=430b [ODistributedAbstractPlugin$4]
2017-01-01 18:08:00:388 INFO [node-2]<-[node-1] Started import of delta for database 'mobius' [OIncrementalServerSync$1]
2017-01-01 18:08:00:438 INFO [node-2]<-[node-1] Installed database delta for 'mobius'. 1 total entries: 0 created, 1 updated, 0 deleted, 0 holes, 0 skipped [OIncrementalServerSync$1]
2017-01-01 18:08:00:458 INFO [node-2] Installed database 'mobius' (LSN=LSN{segment=0, position=2485}) [OHazelcastPlugin]
2017-01-01 18:08:00:698 INFO [node-2] Reassigning ownership of clusters for database mobius... [OHazelcastPlugin]
2017-01-01 18:08:00:708 INFO [node-2] Reassignment of clusters for database 'mobius' completed (classes=21) [OHazelcastPlugin]
2017-01-01 18:08:00:708 INFO [node-2] Publishing ONLINE status for database node-2.mobius... [ODistributedDatabaseImpl]
2017-01-01 18:08:00:718 INFO [node-2]<-[node-1] Installed delta of database 'mobius'... [OHazelcastPlugin]
2017-01-01 18:08:00:738 WARNI [node-2] Updated node status to 'ONLINE' [OHazelcastPlugin]
2017-01-01 18:08:00:738 INFO [node-2] Distributed servers status:
+-------+--------+----------------------+-----+---------+-------------------+-------------------+------------------------+
|Name |Status |Databases |Conns|StartedOn|Binary |HTTP |UsedMemory |
+-------+--------+----------------------+-----+---------+-------------------+-------------------+------------------------+
|node-1 |ONLINE |mobius=ONLINE (MASTER)|4 |18:05:39 |169.254.205.69:2424|169.254.205.69:2480|789.26MB/3.53GB (21.84%)|
|node-2*|STARTING| |1 |18:07:45 |169.254.205.69:2425|169.254.205.69:2481|329.45MB/3.53GB (9.11%) |
+-------+--------+----------------------+-----+---------+-------------------+-------------------+------------------------+
[OHazelcastPlugin]
2017-01-01 18:08:00:738 WARNI Authenticated clients can execute any kind of code into the server by using the following allowed languages: [sql] [OServerSideScriptInterpreter]
2017-01-01 18:08:00:868 INFO OrientDB Studio available at http://169.254.205.69:2481/studio/index.html [OServer]
2017-01-01 18:08:00:868 INFO OrientDB Server is active v2.2.14. [OServer]
2017-01-01 18:08:06:022 INFO [node-2] Distributed servers status:
+-------+------+----------------------+-----+---------+-------------------+-------------------+------------------------+
|Name |Status|Databases |Conns|StartedOn|Binary |HTTP |UsedMemory |
+-------+------+----------------------+-----+---------+-------------------+-------------------+------------------------+
|node-1 |ONLINE|mobius=ONLINE (MASTER)|6 |18:05:39 |169.254.205.69:2424|169.254.205.69:2480|839.83MB/3.53GB (23.23%)|
|node-2*|ONLINE|mobius=ONLINE (MASTER)|2 |18:07:45 |169.254.205.69:2425|169.254.205.69:2481|502.55MB/3.53GB (13.90%)|
+-------+------+----------------------+-----+---------+-------------------+-------------------+------------------------+
@pavlop-asg thanks for the super detailed sequence. Just to be sure on step (5) you wrote Write value "2" -> it's saved only on node-1 since node-2 is down distributed-sync.json files: node-1: hasn't changed node-2:
, but you wanted to say Write value "2" -> it's saved only on node-2 since node-1 is down distributed-sync.json files: node-1: hasn't changed node-2:
right?
Reproduced the problem and fixed in v.2.2.x. Thanks @pavlop-asg!
About step 5, you are right, sorry, was a typo.
Is it already fixed in a master?
we use master only when we release, but you can use 2.2.x. If you can confirm it, we could release a 2.2.15 soon.
Thank you
I have built 2.2.x
branch and tested, looks like the issue remains.
This is the log from node-2
which was started later, but had never version of some data (there are some warning):
2017-01-05 18:09:30:409 INFO [node-2] Current node is a MASTER for database 'mobius' [ODistributedAbstractPlugin$3]
2017-01-05 18:09:30:413 WARNI [node-2]->[[node-1]] requesting delta database sync for 'mobius' on local server... [OHazelcastPlugin]
2017-01-05 18:09:30:430 INFO [node-2]->[node-1] Requesting database delta sync for 'mobius' LSN=LSN{segment=0, position=1760037}... [OHazelcastPlugin]
2017-01-05 18:09:30:439 WARNI [node-2]<-[node-1] Error on installing database delta for 'mobius' (err=Requested delta sync with LSN=LSN{segment=0, position=1760037} but found the following error: Cannot execute delta-sync because a new file has been added. Filename: search.owner.irs(id=319368562630197421)
DB name="mobius") [OHazelcastPlugin]
2017-01-05 18:09:30:439 WARNI [node-2]<-[node-1] Requesting full database 'mobius' sync... [OHazelcastPlugin]
2017-01-05 18:09:30:440 SEVER [node-2]->[node-1] Error on asking delta backup of database 'mobius' (err=Requested delta sync with LSN=LSN{segment=0, position=1760037} but found the following error: Cannot execute delta-sync because a new file has been added. Filename: search.owner.irs(id=319368562630197421)
DB name="mobius") [OHazelcastPlugin]
2017-01-05 18:09:30:441 WARNI [node-2]->[[node-1]] Requesting deploy of database 'mobius' on local server... [OHazelcastPlugin]
2017-01-05 18:09:30:442 INFO [node-2] Downloaded configuration for database 'mobius' from the cluster [ODistributedStorage]
2017-01-05 18:09:30:446 INFO [node-2] Setting new distributed configuration for database: mobius (version=1)
...
2017-01-05 18:09:32:582 WARNI [node-2] Moving existent database 'mobius' in 'C:/Users/node-2/asg/mobius/data/orientdb/databases/mobius' to 'C:/Users/node-2/asg/mobius/data/orientdb/databases//../backup/databases/mobius' and get a fresh copy from a remote node... [OHazelcastPlugin]
2017-01-05 18:09:32:583 SEVER [node-2] Error on moving existent database 'mobius' located in 'C:/Users/node-2/asg/mobius/data/orientdb/databases/mobius' to 'C:\Users\node-2\asg\mobius\data\orientdb\databases\..\backup\databases\mobius'. Deleting old database... [OHazelcastPlugin]
2017-01-05 18:09:32:850 INFO [node-2]<-[node-1] Copying remote database 'mobius' to: C:/Tomcats/tomcat_7443/temp/orientdb/install_mobius.zip [OHazelcastPlugin]
2017-01-05 18:09:32:859 INFO [node-2]<-[node-1] Installing database 'mobius' to: C:/Users/node-2/asg/mobius/data/orientdb/databases/mobius... [OHazelcastPlugin]
2017-01-05 18:09:32:864 INFO [node-2] - writing chunk #1 offset=0 size=383.51KB [OHazelcastPlugin]
2017-01-05 18:09:32:866 INFO [node-2] Database copied correctly, size=383.51KB [ODistributedAbstractPlugin$4]
2017-01-05 18:09:35:202 WARNI {db=mobius} Storage 'mobius' was not closed properly. Will try to recover from write ahead log [OLocalPaginatedStorage]
2017-01-05 18:09:35:202 SEVER {db=mobius} Restore is not possible because write ahead log is empty. [OLocalPaginatedStorage]
2017-01-05 18:09:35:202 INFO {db=mobius} Storage data recover was completed [OLocalPaginatedStorage]
2017-01-05 18:09:36:073 INFO {db=mobius} [node-2] Downloaded configuration for database 'mobius' from the cluster [ODistributedStorage]
2017-01-05 18:09:36:076 INFO {db=mobius} [node-2] Setting new distributed configuration for database: mobius (version=1)
Do you have the full log? What's happened next? Was it able to download the database?
One more, with version 2.2.15-SHAPHOT
sometimes I get the following error when I try to work in a cluster:
Caused by: com.orientechnologies.orient.core.exception.ODatabaseException: Database 'plocal:C:/Users/ppoh/asg/mobius/data/orientdb/databases/mobius' is closed
DB name="mobius"
DB name="mobius"
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_60]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_60]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_60]
at java.lang.reflect.Constructor.newInstance(Constructor.java:422) ~[?:1.8.0_60]
at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.throwSerializedException(OChannelBinaryAsynchClient.java:442) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.handleStatus(OChannelBinaryAsynchClient.java:393) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:275) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:167) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.remote.OStorageRemote.beginResponse(OStorageRemote.java:2170) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.remote.OStorageRemote$27.execute(OStorageRemote.java:1162) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.remote.OStorageRemote$2.execute(OStorageRemote.java:196) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.remote.OStorageRemote.baseNetworkOperation(OStorageRemote.java:237) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperationRetry(OStorageRemote.java:193) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperation(OStorageRemote.java:204) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.remote.OStorageRemote.command(OStorageRemote.java:1138) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.core.sql.query.OSQLQuery.run(OSQLQuery.java:78) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.core.sql.query.OSQLAsynchQuery.run(OSQLAsynchQuery.java:74) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.core.sql.query.OSQLSynchQuery.run(OSQLSynchQuery.java:85) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.core.query.OQueryAbstract.execute(OQueryAbstract.java:33) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.query(ODatabaseDocumentTx.java:754) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.object.db.ODatabasePojoAbstract.query(ODatabasePojoAbstract.java:244) ~[orientdb-object-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.lambda$load$2(OrientDbConfigurationStorage.java:96) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
at com.asg.mobius.orientdb.OrientDbObjectStorageBase.executeAndReturn(OrientDbObjectStorageBase.java:88) ~[orientdb-support-7.0.0-SNAPSHOT.jar:?]
at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.load(OrientDbConfigurationStorage.java:95) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.lambda$loadAll$3(OrientDbConfigurationStorage.java:111) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_60]
at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.loadAll(OrientDbConfigurationStorage.java:110) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
at com.hazelcast.map.impl.MapStoreWrapper.loadAll(MapStoreWrapper.java:143) ~[hazelcast-3.7.3.jar:3.7.3]
at com.hazelcast.map.impl.mapstore.AbstractMapDataStore.loadAll(AbstractMapDataStore.java:55) ~[hazelcast-3.7.3.jar:3.7.3]
at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.loadAndGet(BasicRecordStoreLoader.java:161) ~[hazelcast-3.7.3.jar:3.7.3]
at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.doBatchLoad(BasicRecordStoreLoader.java:134) ~[hazelcast-3.7.3.jar:3.7.3]
at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.loadValuesInternal(BasicRecordStoreLoader.java:120) ~[hazelcast-3.7.3.jar:3.7.3]
at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.access$100(BasicRecordStoreLoader.java:54) ~[hazelcast-3.7.3.jar:3.7.3]
at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader$GivenKeysLoaderTask.call(BasicRecordStoreLoader.java:107) ~[hazelcast-3.7.3.jar:3.7.3]
at com.hazelcast.util.executor.CompletableFutureTask.run(CompletableFutureTask.java:67) ~[hazelcast-3.7.3.jar:3.7.3]
... 6 more
2017-01-05 18:32:52.024 [ERROR] [hz._hzInstance_2_dev.partition-operation.thread-2] com.hazelcast.map.impl.operation.GetOperation:log - [10.129.235.254]:5701 [dev] [3.7.3] Database 'plocal:C:/Users/ppoh/asg/mobius/data/orientdb/databases/mobius' is closed
DB name="mobius"
DB name="mobius"
com.orientechnologies.orient.core.exception.ODatabaseException: Database 'plocal:C:/Users/ppoh/asg/mobius/data/orientdb/databases/mobius' is closed
DB name="mobius"
DB name="mobius"
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_60]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_60]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_60]
at java.lang.reflect.Constructor.newInstance(Constructor.java:422) ~[?:1.8.0_60]
at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.throwSerializedException(OChannelBinaryAsynchClient.java:442) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.handleStatus(OChannelBinaryAsynchClient.java:393) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:275) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:167) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.remote.OStorageRemote.beginResponse(OStorageRemote.java:2170) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.remote.OStorageRemote$27.execute(OStorageRemote.java:1162) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.remote.OStorageRemote$2.execute(OStorageRemote.java:196) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.remote.OStorageRemote.baseNetworkOperation(OStorageRemote.java:237) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperationRetry(OStorageRemote.java:193) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperation(OStorageRemote.java:204) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.client.remote.OStorageRemote.command(OStorageRemote.java:1138) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.core.sql.query.OSQLQuery.run(OSQLQuery.java:78) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.core.sql.query.OSQLAsynchQuery.run(OSQLAsynchQuery.java:74) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.core.sql.query.OSQLSynchQuery.run(OSQLSynchQuery.java:85) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.core.query.OQueryAbstract.execute(OQueryAbstract.java:33) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.query(ODatabaseDocumentTx.java:754) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.orientechnologies.orient.object.db.ODatabasePojoAbstract.query(ODatabasePojoAbstract.java:244) ~[orientdb-object-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.lambda$load$2(OrientDbConfigurationStorage.java:96) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
at com.asg.mobius.orientdb.OrientDbObjectStorageBase.executeAndReturn(OrientDbObjectStorageBase.java:88) ~[orientdb-support-7.0.0-SNAPSHOT.jar:?]
at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.load(OrientDbConfigurationStorage.java:95) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.lambda$loadAll$3(OrientDbConfigurationStorage.java:111) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_60]
at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.loadAll(OrientDbConfigurationStorage.java:110) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
at com.hazelcast.map.impl.MapStoreWrapper.loadAll(MapStoreWrapper.java:143) ~[hazelcast-3.7.3.jar:3.7.3]
at com.hazelcast.map.impl.mapstore.AbstractMapDataStore.loadAll(AbstractMapDataStore.java:55) ~[hazelcast-3.7.3.jar:3.7.3]
at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.loadAndGet(BasicRecordStoreLoader.java:161) ~[hazelcast-3.7.3.jar:3.7.3]
at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.doBatchLoad(BasicRecordStoreLoader.java:134) ~[hazelcast-3.7.3.jar:3.7.3]
at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.loadValuesInternal(BasicRecordStoreLoader.java:120) ~[hazelcast-3.7.3.jar:3.7.3]
at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.access$100(BasicRecordStoreLoader.java:54) ~[hazelcast-3.7.3.jar:3.7.3]
at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader$GivenKeysLoaderTask.call(BasicRecordStoreLoader.java:107) ~[hazelcast-3.7.3.jar:3.7.3]
at com.hazelcast.util.executor.CompletableFutureTask.run(CompletableFutureTask.java:67) ~[hazelcast-3.7.3.jar:3.7.3]
at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:212) ~[hazelcast-3.7.3.jar:3.7.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_60]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_60]
at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_60]
at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76) ~[hazelcast-3.7.3.jar:3.7.3]
at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:92) ~[hazelcast-3.7.3.jar:3.7.3]
Sorry, do not have the full log right now, I was trying to repeat the test and faced the error above.
The reason of that exception is the following:
After node-2
joined the cluster this happened:
2017-01-05 18:28:58:293 INFO [10.129.235.254]:2434 [orientdb] [3.7.3] Accepting socket connection from /10.129.235.254:61063 [SocketAcceptorThread]
2017-01-05 18:28:58:306 INFO [10.129.235.254]:2434 [orientdb] [3.7.3] Established socket connection between /10.129.235.254:2434 and /10.129.235.254:61063 [TcpIpConnectionManager]
2017-01-05 18:29:05:285 INFO [10.129.235.254]:2434 [orientdb] [3.7.3]
Members [2] {
Member [10.129.235.254]:2434 - be3473c4-1407-48c7-a772-5ec20ff8946a this
Member [10.129.235.254]:2435 - 9f08c22b-351d-4884-a616-ed61a69f0b8e
}
[ClusterService]
2017-01-05 18:29:05:285 WARNI [node-1] Added new node id=Member [10.129.235.254]:2435 - 9f08c22b-351d-4884-a616-ed61a69f0b8e name=ext:9f08c22b-351d-4884-a616-ed61a69f0b8e [OHazelcastPlugin]
2017-01-05 18:29:05:603 INFO [10.129.235.254]:2434 [orientdb] [3.7.3] Re-partitioning cluster data... Migration queue size: 271 [MigrationManager]
2017-01-05 18:29:07:311 INFO [10.129.235.254]:2434 [orientdb] [3.7.3] All migration tasks have been completed, queues are empty. [MigrationThread]
2017-01-05 18:29:07:458 INFO [node-1]<-[ext:9f08c22b-351d-4884-a616-ed61a69f0b8e] Received updated about registered nodes [OHazelcastPlugin]
2017-01-05 18:29:07:845 INFO [node-1]<-[node-2] Added node configuration id=Member [10.129.235.254]:2435 - 9f08c22b-351d-4884-a616-ed61a69f0b8e name=node-2, now 2 nodes are configured [OHazelcastPlugin]
2017-01-05 18:29:07:846 INFO [node-1] Distributed servers status:
+-------+--------+----------------------+-----+---------+----------------+----------------+------------------------+
|Name |Status |Databases |Conns|StartedOn|Binary |HTTP |UsedMemory |
+-------+--------+----------------------+-----+---------+----------------+----------------+------------------------+
|node-1*|ONLINE |mobius=ONLINE (MASTER)|4 |18:27:27 |10.33.77.53:2424|10.33.77.53:2480|768.49MB/3.53GB (21.26%)|
|node-2 |STARTING| |0 |18:28:54 |10.33.77.53:2425|10.33.77.53:2481|333.52MB/3.53GB (9.23%) |
+-------+--------+----------------------+-----+---------+----------------+----------------+------------------------+
[OHazelcastPlugin]
2017-01-05 18:29:08:181 INFO [node-1] Setting new distributed configuration for database: OSystem (version=1)
CLUSTER CONFIGURATION (LEGEND: X = Owner, o = Copy)
+--------+-----------+----------+
|CLUSTER |writeQuorum|readQuorum|
+--------+-----------+----------+
|* | 1 | 1 |
|internal| 1 | 1 |
+--------+-----------+----------+
[ODistributedStorage]
2017-01-05 18:29:08:229 SEVER [node-1]--[node-2] Skip deploying delta database 'mobius' because the requesting server has a most recent database (requester LastOperationOn=2017-01-05 18:09:37.209 current LastOperationOn=2017-01-05 18:09:32.573) [OSyncDatabaseDeltaTask]
2017-01-05 18:29:08:238 INFO [node-1]<-[node-2] Received new status node-2.mobius=ONLINE [OHazelcastPlugin]
2017-01-05 18:29:08:240 INFO [node-1]<-[node-2] Received updated status node-1.mobius=NOT_AVAILABLE [OHazelcastPlugin]
2017-01-05 18:29:08:241 INFO [node-1] Distributed servers status:
+-------+--------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|Name |Status |Databases |Conns|StartedOn|Binary |HTTP |UsedMemory |
+-------+--------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|node-1*|ONLINE |mobius=NOT_AVAILABLE (MASTER)|4 |18:27:27 |10.33.77.53:2424|10.33.77.53:2480|768.49MB/3.53GB (21.26%)|
|node-2 |STARTING| |0 |18:28:54 |10.33.77.53:2425|10.33.77.53:2481|340.27MB/3.53GB (9.41%) |
+-------+--------+-----------------------------+-----+---------+----------------+----------------+------------------------+
[OHazelcastPlugin]
2017-01-05 18:29:13:609 INFO [node-1] Distributed servers status:
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|Name |Status|Databases |Conns|StartedOn|Binary |HTTP |UsedMemory |
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|node-1*|ONLINE|mobius=NOT_AVAILABLE (MASTER)|6 |18:27:27 |10.33.77.53:2424|10.33.77.53:2480|818.16MB/3.53GB (22.64%)|
|node-2 |ONLINE|mobius=ONLINE (MASTER) |2 |18:28:54 |10.33.77.53:2425|10.33.77.53:2481|469.05MB/3.53GB (12.98%)|
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
[OHazelcastPlugin]
2017-01-05 18:29:13:886 INFO [node-1] Trying to recover current server for database 'mobius'... [OClusterHealthChecker]
2017-01-05 18:29:13:889 INFO [node-1] Current node is a MASTER for database 'mobius' [ODistributedAbstractPlugin$3]
2017-01-05 18:29:13:890 WARNI [node-1]->[[node-2]] requesting delta database sync for 'mobius' on local server... [OHazelcastPlugin]
2017-01-05 18:29:13:891 INFO [node-1]->[node-2] Requesting database delta sync for 'mobius' LSN=LSN{segment=0, position=168631}... [OHazelcastPlugin]
2017-01-05 18:29:14:302 INFO [node-1]<-[node-2] Received updated status node-1.mobius=SYNCHRONIZING [OHazelcastPlugin]
2017-01-05 18:29:14:303 INFO [node-1] Distributed servers status:
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|Name |Status|Databases |Conns|StartedOn|Binary |HTTP |UsedMemory |
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|node-1*|ONLINE|mobius=SYNCHRONIZING (MASTER)|6 |18:27:27 |10.33.77.53:2424|10.33.77.53:2480|818.16MB/3.53GB (22.64%)|
|node-2 |ONLINE|mobius=ONLINE (MASTER) |2 |18:28:54 |10.33.77.53:2425|10.33.77.53:2481|469.05MB/3.53GB (12.98%)|
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
[OHazelcastPlugin]
2017-01-05 18:29:14:314 WARNI [node-1]<-[node-2] Error on installing database delta for 'mobius' (err=Requested database delta sync with LSN=LSN{segment=0, position=168631} but not found in database) [OHazelcastPlugin]
2017-01-05 18:29:14:314 WARNI [node-1]<-[node-2] Requesting full database 'mobius' sync... [OHazelcastPlugin]
2017-01-05 18:29:14:315 SEVER [node-1]->[node-2] Error on asking delta backup of database 'mobius' (err=Requested database delta sync with LSN=LSN{segment=0, position=168631} but not found in database) [OHazelcastPlugin]
2017-01-05 18:29:14:509 WARNI [node-1]->[[node-2]] Requesting deploy of database 'mobius' on local server... [OHazelcastPlugin]
2017-01-05 18:29:14:517 INFO [node-1] Downloaded configuration for database 'mobius' from the cluster [ODistributedStorage]
2017-01-05 18:29:14:518 INFO [node-1] Setting new distributed configuration for database: mobius (version=1)
CLUSTER CONFIGURATION (LEGEND: X = Owner, o = Copy)
+--------+-----------+----------+-------------+-------+
| | | | MASTER |MASTER |
| | | |SYNCHRONIZING|ONLINE |
| | | | dynamic |dynamic|
+--------+-----------+----------+-------------+-------+
|CLUSTER |writeQuorum|readQuorum| node-1 |node-2 |
+--------+-----------+----------+-------------+-------+
|* | 1 | 1 | X | o |
|internal| 1 | 1 | | |
+--------+-----------+----------+-------------+-------+
[ODistributedStorage]
2017-01-05 18:29:14:537 INFO [node-1]<-[node-2] Received updated status node-2.mobius=BACKUP [OHazelcastPlugin]
2017-01-05 18:29:14:539 INFO [node-1] Distributed servers status:
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|Name |Status|Databases |Conns|StartedOn|Binary |HTTP |UsedMemory |
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|node-1*|ONLINE|mobius=SYNCHRONIZING (MASTER)|6 |18:27:27 |10.33.77.53:2424|10.33.77.53:2480|818.16MB/3.53GB (22.64%)|
|node-2 |ONLINE|mobius=BACKUP (MASTER) |2 |18:28:54 |10.33.77.53:2425|10.33.77.53:2481|469.05MB/3.53GB (12.98%)|
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
[OHazelcastPlugin]$ANSI{green {db=mobius}} Error during storage close
com.orientechnologies.orient.core.exception.OStorageException: Cannot open the storage 'mobius' because it does not exist in path: C:/Users/ppoh/asg/mobius/data/orientdb/databases/mobius
DB name="mobius"
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.open(OAbstractPaginatedStorage.java:285)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.open(ODatabaseDocumentTx.java:261)
at com.orientechnologies.orient.server.OServer.openDatabaseBypassingSecurity(OServer.java:980)
at com.orientechnologies.orient.server.OServer.openDatabase(OServer.java:959)
at com.orientechnologies.orient.server.OServer.openDatabase(OServer.java:932)
at com.orientechnologies.orient.server.OServer.openDatabase(OServer.java:923)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.openDatabase(ONetworkProtocolBinary.java:847)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.handshakeRequest(ONetworkProtocolBinary.java:231)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.execute(ONetworkProtocolBinary.java:196)
at com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:77)
$ANSI{green {db=mobius}} Error during storage close
Any comments on that? And could you please tell me an approximate date of 2.2.15 release?
Looks like this issue is fixed, but I found one more minor issue related to that #7073
I have tested released 2.2.15 version and the fix was removed. Was not it released because of #7073 bug?
Hi @pavlop-asg,
Thanks for your comment. I'm sorry to hear about your problems. It seems strange to me this difference between 2.2.15-SHAPHOT and 2.2.15. As far as I see the fix has been included in 2.2.15
Are you sure you made the very same test on 2.2.15-SHAPHOT and 2.2.15? I am reopening the issue for now
Thanks,
I have repeated a test with 2.2.15 and looks like the issue is fixed only partly. Use cases:
I start node-1 and after node-2 and write a value "1" to a document. I shut down node-1 and write value "2", after I shut down node-2 as well. When I start node-1 and after node-2, nothing gets synchronized and I see the old value "1". It means that issue remains for this case.
I start node-1 and after node-2 and write a value "1" to a document. I shut down node-2 and write value "2", after I shut down node-1 as well. When I start node-2 and after node-1, data get synchronized but we get an issue #7073
I've just tried the 2nd case and everything works. The node1 was restarted, it recognized that its database was newer. This is the log.
2017-01-25 22:38:24:458 INFO [node1]->[[node2]] Current copy of database 'demo' is newer then the copy present in the cluster. Use the local copy and force other nodes to download this [ODistributedAbstractPlugin$3]
2017-01-25 22:38:24:468 WARNI [node1] Updated node status to 'ONLINE' [OHazelcastPlugin]
2017-01-25 22:38:24:470 INFO [node1] Distributed servers status:
+------+--------+---------------------------+-----+---------+---------------+---------------+----------------------+
|Name |Status |Databases |Conns|StartedOn|Binary |HTTP |UsedMemory |
+------+--------+---------------------------+-----+---------+---------------+---------------+----------------------+
|node2 |ONLINE |demo=NOT_AVAILABLE (MASTER)|1 |22:38:04 |10.1.1.100:2424|10.1.1.100:2480|35.50MB/3.56GB (0.97%)|
|node1*|STARTING| |3 |22:38:13 |10.1.1.100:2425|10.1.1.100:2481|35.36MB/8.00GB (0.43%)|
+------+--------+---------------------------+-----+---------+---------------+---------------+----------------------+
[OHazelcastPlugin]
2017-01-25 22:38:24:472 WARNI Authenticated clients can execute any kind of code into the server by using the following allowed languages: [sql] [OServerSideScriptInterpreter]
2017-01-25 22:38:24:472 INFO [OEnterpriseAgent]
2017-01-25 22:38:24:472 INFO ***************************************************************************** [OEnterpriseAgent]
2017-01-25 22:38:24:472 INFO * ORIENTDB - ENTERPRISE EDITION * [OEnterpriseAgent]
2017-01-25 22:38:24:473 INFO ***************************************************************************** [OEnterpriseAgent]
2017-01-25 22:38:24:473 INFO * If you are in Production or Test, you must purchase a commercial license. * [OEnterpriseAgent]
2017-01-25 22:38:24:473 INFO * For more information look at: http://orientdb.com/orientdb-enterprise/ * [OEnterpriseAgent]
2017-01-25 22:38:24:473 INFO ***************************************************************************** [OEnterpriseAgent]
2017-01-25 22:38:24:473 INFO [OEnterpriseAgent]
2017-01-25 22:38:24:487 INFO Profiler is recording metrics with configuration: 60,24 [OEnterpriseProfiler]
2017-01-25 22:38:24:677 INFO Scheduled [FULL_BACKUP] task : fa461098-a6ea-410a-8909-6c974f12e58e. Next execution will be Sun Jan 15 08:10:00 EST 2017 [OBackupTask]
2017-01-25 22:38:24:719 INFO OrientDB Studio available at http://10.1.1.100:2481/studio/index.html [OServer]
2017-01-25 22:38:24:719 INFO OrientDB Server is active v2.2.16-SNAPSHOT. [OServer]
2017-01-25 22:38:28:285 INFO [node1] Received updated status node2.demo=SYNCHRONIZING [OHazelcastPlugin]
2017-01-25 22:38:28:285 INFO [node1]->[node2] Deploying database 'demo' with delta of changes... [OSyncDatabaseDeltaTask]
2017-01-25 22:38:28:285 INFO [node1]->[node2] Creating delta backup of database 'demo' (startLSN=LSN{segment=0, position=484849}) in directory: /var/folders/zc/y34429014c3bt_x1587qblth0000gn/T/orientdb/backup_node2_demo.zip... [OSyncDatabaseDeltaTask]
2017-01-25 22:38:28:286 INFO [node1] Distributed servers status:
+------+--------+---------------------------+-----+---------+---------------+---------------+----------------------+
|Name |Status |Databases |Conns|StartedOn|Binary |HTTP |UsedMemory |
+------+--------+---------------------------+-----+---------+---------------+---------------+----------------------+
|node2 |ONLINE |demo=SYNCHRONIZING (MASTER)|1 |22:38:04 |10.1.1.100:2424|10.1.1.100:2480|35.50MB/3.56GB (0.97%)|
|node1*|STARTING| |3 |22:38:13 |10.1.1.100:2425|10.1.1.100:2481|35.36MB/8.00GB (0.43%)|
+------+--------+---------------------------+-----+---------+---------------+---------------+----------------------+
[OHazelcastPlugin]
2017-01-25 22:38:28:290 INFO Exporting records after LSN=LSN{segment=0, position=484849}. Found 1 records [OEnterpriseLocalPaginatedStorage]
2017-01-25 22:38:28:291 INFO [node1]->[node2] Delta backup of database 'demo' completed. range=LSN{segment=0, position=484849}-LSN{segment=0, position=485604} [OSyncDatabaseDeltaTask]
2017-01-25 22:38:28:291 INFO [node1]->[node2] Deploy delta database task completed [OSyncDatabaseDeltaTask]
2017-01-25 22:38:28:293 INFO [node1]->[node2] - transferring chunk #1 offset=0 size=44... [OSyncDatabaseDeltaTask]
2017-01-25 22:38:28:349 INFO [node1]<-[node2] Received updated status node2.demo=ONLINE [OHazelcastPlugin]
2017-01-25 22:38:29:480 INFO [node1] Distributed servers status:
+------+------+--------------------+-----+---------+---------------+---------------+----------------------+
|Name |Status|Databases |Conns|StartedOn|Binary |HTTP |UsedMemory |
+------+------+--------------------+-----+---------+---------------+---------------+----------------------+
|node2 |ONLINE|demo=ONLINE (MASTER)|1 |22:38:04 |10.1.1.100:2424|10.1.1.100:2480|35.50MB/3.56GB (0.97%)|
|node1*|ONLINE|demo=ONLINE (MASTER)|3 |22:38:13 |10.1.1.100:2425|10.1.1.100:2481|67.84MB/8.00GB (0.83%)|
+------+------+--------------------+-----+---------+---------------+---------------+----------------------+
[OHazelcastPlugin]
Actually this worked out after a fix that I just pushed on 2.2.x. Please could you test both cases? They both work for me.
Thank you, I will try.
It is working fine for me after your fix. Thank you a lot!
Could you tell how soon 2.2.16 will be released?
Hi @pavlop-asg,
Glad to hear the issue has been fixed. Thanks for your patience and tests
I'll discuss internally and try to understand when 2.2.16 will be released
Thanks,
OrientDB Version, operating system, or hardware.
Operating System
Expected behavior and actual behavior
There is an example. We have node A and B, we write a value "1" to a document when node A and B are up. We shut down node B and write value "2", after we shut down node A as well. When we start node B and after node A - cluster will have old value "1" instead of "2", looks like there is some bug when OrientDB merges the records. NOTE: when we start node A and after node B - everything works as expected.
Steps to reproduce the problem
Start two nodes with the following distributed configuration:
This issue is applicable for other configurations as well, for example: