Open SergeyOvsienko opened 10 years ago
Hi, First, you need to execute the "join-cluster" command on the Manager-console. Then each cluster of the manager start to communicate each other.
Actually, The multi datacenter's doc is not still enough - #224 - I'll write and provide it soon.
Hi, I am execute the "join-cluster" command on the Manager-console.
leofs-adm cluster-status
cluster id | dc id | status | # of storages | updated at
-----------+------------+--------------+----------------+-----------------------------
eu | dc_eu | running | 2 | 2014-09-01 07:53:49 +0000
leofs-adm cluster-status
cluster id | dc id | status | # of storages | updated at
-----------+------------+--------------+----------------+-----------------------------
us | dc_us | running | 2 | 2014-09-01 15:48:49 +0000
Replication began, but was interrupted after 49 replication objects from 110
Thanks for the quick reply, I'll wait documentation
Hi, I have new problem)
In work cluster with [Multi DC replication settings] max # of joinable DCs : 2
I changed in /usr/local/leofs/1.1.1/leo_manager_0/etc/leo_manager.conf
mdc_replication.max_targets = 3
mdc_replication.num_of_replicas_a_dc = 1
Restarted both leo_managers
but max # of joinable DCs : 2 not changed
How I can change this value?
I want remove cluster
leofs-adm cluster-status
cluster id | dc id | status | # of storages | updated at
-----------+------------+--------------+----------------+-----------------------------
la3_1 | dc_la3 | running | 2 | 2014-09-03 14:22:26 +0000
leofs-adm remove-cluster manager_0@10.30.10.151 manager_1@10.30.10.176 [ERROR] Could not connect
How I can resolve this problem?
from error log leo_manager_0
tail -f error
[E] manager_0@10.60.0.40 2014-09-03 14:29:00.483614 +0000 1409754540 null:null 0 ["CRASH REPORT ",[80,114,111,99,101,115,115,32,"<0.1987.0>",32,119,105,116,104,32,"1",32,110,101,105,103,104,98,111,117,114,115,32,"exited",32,119,105,116,104,32,114,101,97,115,111,110,58,32,[[123,["connection_error",44,[123,["connection_error",44,"econnrefused"],125]],125]," in ",[["gen_server",58,"init_it",47,"6"],[32,108,105,110,101,32,"320"]]]]]
[E] manager_0@10.60.0.40 2014-09-03 14:29:00.484477 +0000 1409754540 null:null 0 Supervisor leo_rpc_client_manager_1_at_13075_sup had child leo_rpc_client_manager_1_at_13075 started with leo_pod_manager:start_link(leo_rpc_client_manager_1_at_13075, 16, 16, leo_rpc_client_conn, [manager_1,"10.30.10.176",13075,0], #Fun
Sorry for my english, he is very bad)
Maybe this information will be useful
I created two new clusters for tests
First
leofs-adm status [System config] System version : 1.1.1 Cluster Id : gv_1 DC Id : dc_gv Total replicas : 1
# of successes of W : 1
# of successes of D : 1
ring size : 2^128
Current ring hash : 84518678
Prev ring hash : 84518678
[Multi DC replication settings] max # of joinable DCs : 3
[Node(s) state]
-------+---------------------------+--------------+----------------+----------------+----------------------------
type | node | state | current ring | prev ring | updated at
-------+---------------------------+--------------+----------------+----------------+----------------------------
S | storage_0@10.60.0.40 | running | 84518678 | 84518678 | 2014-09-03 14:36:06 +0000
S | storage_1@10.60.0.45 | running | 84518678 | 84518678 | 2014-09-03 14:36:06 +0000
G | gateway_0@10.60.0.40 | running | 84518678 | 84518678 | 2014-09-03 14:36:08 +0000
G | gateway_1@10.60.0.45 | running | 84518678 | 84518678 | 2014-09-03 14:36:07 +0000
second leofs-adm status [System config] System version : 1.1.1 Cluster Id : la3_1 DC Id : dc_la3 Total replicas : 1
# of successes of W : 1
# of successes of D : 1
ring size : 2^128
Current ring hash : 2623744e
Prev ring hash : 2623744e
[Multi DC replication settings] max # of joinable DCs : 3
[Node(s) state]
-------+-----------------------------+--------------+----------------+----------------+----------------------------
type | node | state | current ring | prev ring | updated at
-------+-----------------------------+--------------+----------------+----------------+----------------------------
S | storage_0@10.30.10.151 | running | 2623744e | 2623744e | 2014-09-03 14:22:21 +0000
S | storage_1@10.30.10.176 | running | 2623744e | 2623744e | 2014-09-03 14:22:21 +0000
G | gateway_0@10.30.10.151 | running | 2623744e | 2623744e | 2014-09-03 14:22:24 +0000
G | gateway_1@10.30.10.176 | running | 2623744e | 2623744e | 2014-09-03 14:22:23 +0000
And now I run join-cluster
root@s-140115-3:/usr/local/leofs/1.1.1# leofs-adm join-cluster manager_0@10.30.10.151 manager_1@10.30.10.176 OK
After join - remove-cluster
root@s-140115-3:/usr/local/leofs/1.1.1# leofs-adm remove-cluster manager_0@10.30.10.151 manager_1@10.30.10.176 [ERROR] Could not connect
After error I run cluster-status root@s-140115-3:/usr/local/leofs/1.1.1# leofs-adm cluster-status root@s-140115-3:/usr/local/leofs/1.1.1# leofs-adm cluster-status
try again join-cluster root@s-140115-3:/usr/local/leofs/1.1.1# leofs-adm join-cluster manager_0@10.30.10.151 manager_1@10.30.10.176 [ERROR] Over max number of clusters root@s-140115-3:/usr/local/leofs/1.1.1# leofs-adm join-cluster manager_0@10.30.10.151 manager_1@10.30.10.176 [ERROR] Over max number of clusters
ping -c 2 10.30.10.151 PING 10.30.10.151 (10.30.10.151) 56(84) bytes of data. 64 bytes from 10.30.10.151: icmp_seq=1 ttl=62 time=161 ms 64 bytes from 10.30.10.151: icmp_seq=2 ttl=62 time=161 ms
--- 10.30.10.151 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 161.212/161.285/161.358/0.073 ms root@s-140115-3:/usr/local/leofs/1.1.1# ping -c 2 10.30.10.176 PING 10.30.10.176 (10.30.10.176) 56(84) bytes of data. 64 bytes from 10.30.10.176: icmp_seq=1 ttl=62 time=161 ms 64 bytes from 10.30.10.176: icmp_seq=2 ttl=62 time=161 ms
--- 10.30.10.176 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 161.160/161.177/161.194/0.017 ms
Hi, You're able to check as follows:
$ sh ./build_mdcr_env.sh
$ ./leofs-adm start
Generating RING...
Generated RING
OK 33% - storage_1@127.0.0.1
OK 67% - storage_0@127.0.0.1
OK 100% - storage_2@127.0.0.1
OK
$ ./leofs-adm status
[System config]
System version : 1.1.2
Cluster Id : leofs_1
DC Id : dc_1
Total replicas : 2
# of successes of R : 1
# of successes of W : 1
# of successes of D : 1
# of DC-awareness replicas : 0
ring size : 2^128
Current ring hash : 92667d97
Prev ring hash : 92667d97
[Multi DC replication settings]
max # of joinable DCs : 2
# of replicas a DC : 1
[Node(s) state]
-------+--------------------------+--------------+----------------+----------------+----------------------------
type | node | state | current ring | prev ring | updated at
-------+--------------------------+--------------+----------------+----------------+----------------------------
S | storage_0@127.0.0.1 | running | 92667d97 | 92667d97 | 2014-09-04 16:34:57 +0900
S | storage_1@127.0.0.1 | running | 92667d97 | 92667d97 | 2014-09-04 16:34:57 +0900
S | storage_2@127.0.0.1 | running | 92667d97 | 92667d97 | 2014-09-04 16:34:57 +0900
G | gateway_0@127.0.0.1 | running | 92667d97 | 92667d97 | 2014-09-04 16:34:58 +0900
$ ./leofs-adm -p 10110 start
Generating RING...
Generated RING
OK 33% - storage_11@127.0.0.1
OK 67% - storage_10@127.0.0.1
OK 100% - storage_12@127.0.0.1
OK
$ ./leofs-adm -p 10110 status
[System config]
System version : 1.1.2
Cluster Id : leofs_2
DC Id : dc_2
Total replicas : 2
# of successes of R : 1
# of successes of W : 1
# of successes of D : 1
# of DC-awareness replicas : 0
ring size : 2^128
Current ring hash : d3f08306
Prev ring hash : d3f08306
[Multi DC replication settings]
max # of joinable DCs : 2
# of replicas a DC : 1
[Node(s) state]
-------+---------------------------+--------------+----------------+----------------+----------------------------
type | node | state | current ring | prev ring | updated at
-------+---------------------------+--------------+----------------+----------------+----------------------------
S | storage_10@127.0.0.1 | running | d3f08306 | d3f08306 | 2014-09-04 16:36:03 +0900
S | storage_11@127.0.0.1 | running | d3f08306 | d3f08306 | 2014-09-04 16:36:03 +0900
S | storage_12@127.0.0.1 | running | d3f08306 | d3f08306 | 2014-09-04 16:36:03 +0900
G | gateway_10@127.0.0.1 | running | d3f08306 | d3f08306 | 2014-09-04 16:36:04 +0900
$ ./leofs-adm join-cluster manager_0@127.0.0.1:13095 manager_0@127.0.0.1:13096
OK
$ ./leofs-adm cluster-status
cluster id | dc id | status | # of storages | updated at
-----------+------------+--------------+----------------+-----------------------------
leofs_2 | dc_2 | running | 3 | 2014-09-04 16:36:19 +0900
Thank you very much!
Hi, Thank you for new documentation. I edited leo_storage.conf according to http://leo-project.net/leofs/docs/configuration_5.html, geo replication earned
leofs-adm cluster-status
cluster id | dc id | status | # of storages | updated at
-----------+------------+--------------+----------------+-----------------------------
gv_1 | dc_gv | running | 2 | 2014-09-04 14:06:10 +0000
On US
leofs-adm whereis data/4669
-------+-----------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
del? | node | ring address | size | checksum | # of chunks | clock | when
-------+-----------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
| storage_la3_1@x.x.x.173 | 5318f613d6dfa3bb79b847af609bd74d | 1818950K | da59c1dd3a | 119 | 5023e04a61d53 | 2014-09-04 14:17:52 +0000
on EU
leofs-adm whereis data/4669
-------+---------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
del? | node | ring address | size | checksum | # of chunks | clock | when
-------+---------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
| storage_gv_0@x.x.x.16 | 5318f613d6dfa3bb79b847af609bd74d | 1818950K | da59c1dd3a | 119 | 5023e04a61d53 | 2014-09-04 14:17:52 +0000
Thank you for your job!
We're planning to implement replication for geographical optimization as one of the future plans. So we does not provide its function, yet.
Ok, tnx. I'll wait for the new version
And I wanted to clarify, can i change Total replicas and max # of joinable DCs without rebuilding the cluster? I did not find this in the documentation.
Hi, I stumbled upon a new problem
I am run compact-start on all storage in both DC
leofs-adm compact-start storage_gv_0@x.x.x.16 all leofs-adm compact-start storage_gv_1@x.x.x.21 all
leofs-adm compact-start storage_la3_0@x.x.x.172 all leofs-adm compact-start storage_la3_1@x.x.x.173 all
after compact finished, I found that some of the files are no longer available in some of the data center
in la3 dc
leofs-adm whereis data/111668
-------+-----------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
del? | node | ring address | size | checksum | # of chunks | clock | when
-------+-----------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
| storage_la3_1@x.x.x.173 | | | | | |
in GV dc
leofs-adm whereis data/111668
-------+---------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
del? | node | ring address | size | checksum | # of chunks | clock | when
-------+---------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
| storage_gv_1@x.x.x.21 | d5e949029ae6c54b5d6e3d9afa9c2fb2 | 406970K | ae31780332 | 27 | 5023e71c053a3 | 2014-09-04 14:48:20 +0000
Run leofs-adm recover-file data/111668 GV dc
some time later in LA3 dc
leofs-adm whereis data/111668
-------+-----------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
del? | node | ring address | size | checksum | # of chunks | clock | when
-------+-----------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
| storage_la3_1@x.x.x.173 | d5e949029ae6c54b5d6e3d9afa9c2fb2 | 0B | ae31780332 | 27 | 5023e71c053a3 | 2014-09-04 14:48:20 +0000
But size - 0B
Can I fix this?
Thank you for your report. We'll check this issue.
tnx.
I've fixed the issue. You're able to check this issue with the develop branch. the way of building the develop branch as follows:
$ git clone https://github.com/leo-project/leofs.git
$ cd leofs
$ git checkout develop
$ ./rebar get-deps
$ sh ./git_checkout.sh develop
$ make && make release
$ ./leofs-adm -p 10110 status
[System config]
System version : 1.1.2
Cluster Id : leofs_2
DC Id : dc_2
Total replicas : 2
# of successes of R : 1
# of successes of W : 1
# of successes of D : 1
# of DC-awareness replicas : 0
ring size : 2^128
Current ring hash : b12ed579
Prev ring hash : b12ed579
[Multi DC replication settings]
max # of joinable DCs : 2
# of replicas a DC : 1
[Node(s) state]
-------+---------------------------+--------------+----------------+----------------+----------------------------
type | node | state | current ring | prev ring | updated at
-------+---------------------------+--------------+----------------+----------------+----------------------------
S | storage_10@127.0.0.1 | running | b12ed579 | b12ed579 | 2014-09-11 13:47:05 +0900
S | storage_11@127.0.0.1 | running | b12ed579 | b12ed579 | 2014-09-11 13:47:05 +0900
S | storage_12@127.0.0.1 | running | b12ed579 | b12ed579 | 2014-09-11 13:47:05 +0900
G | gateway_10@127.0.0.1 | running | b12ed579 | b12ed579 | 2014-09-11 13:47:07 +0900
$ ./leofs-adm -p 10110 du storage_10@127.0.0.1
active number of objects: 436
total number of objects: 439
active size of objects: 22952403
total size of objects: 22978098
ratio of active size: 99.89%
last compaction start: ____-__-__ __:__:__
last compaction end: ____-__-__ __:__:__
$ ./leofs-adm -p 10110 du storage_11@127.0.0.1
active number of objects: 387
total number of objects: 396
active size of objects: 18852012
total size of objects: 18877812
ratio of active size: 99.86%
last compaction start: ____-__-__ __:__:__
last compaction end: ____-__-__ __:__:__
$ ./leofs-adm -p 10110 du storage_12@127.0.0.1
active number of objects: 427
total number of objects: 427
active size of objects: 31367572
total size of objects: 31367572
ratio of active size: 100.0%
last compaction start: ____-__-__ __:__:__
last compaction end: ____-__-__ __:__:__
$ ./leofs-adm -p 10110 du storage_10@127.0.0.1
active number of objects: 436
total number of objects: 436
active size of objects: 22952403
total size of objects: 22952403
ratio of active size: 100.0%
last compaction start: 2014-09-11 14:12:01 +0900
last compaction end: 2014-09-11 14:12:02 +0900
$ ./leofs-adm -p 10110 du storage_11@127.0.0.1
active number of objects: 387
total number of objects: 387
active size of objects: 18852012
total size of objects: 18852012
ratio of active size: 100.0%
last compaction start: 2014-09-11 14:13:16 +0900
last compaction end: 2014-09-11 14:13:16 +0900
$ ./leofs-adm -p 10110 du storage_12@127.0.0.1
active number of objects: 385
total number of objects: 385
active size of objects: 27149231
total size of objects: 27149231
ratio of active size: 100.0%
last compaction start: 2014-09-11 14:18:28 +0900
last compaction end: 2014-09-11 14:18:29 +0900
Thank you very much!
I will soon be testing
HI, the first results
I updated leofs according to your instruction
$ git clone https://github.com/leo-project/leofs.git $ cd leofs $ git checkout develop $ ./rebar get-deps $ sh ./git_checkout.sh develop $ make && make release
Clusters up normal
First
leofs-adm status [System config] System version : 1.1.2 Cluster Id : la3_1 DC Id : dc_la3 Total replicas : 1 # of successes of R : 1 # of successes of W : 1 # of successes of D : 1 # of DC-awareness replicas : 0 ring size : 2^128 Current ring hash : 3e811d75 Prev ring hash : 3e811d75 [Multi DC replication settings] max # of joinable DCs : 3 # of replicas a DC : 1 [Node(s) state] -------+-----------------------------------+--------------+----------------+----------------+---------------------------- type | node | state | current ring | prev ring | updated at -------+-----------------------------------+--------------+----------------+----------------+---------------------------- S | storage_la3_0@x.x.x.172 | running | 3e811d75 | 3e811d75 | 2014-09-11 14:39:23 +0000 S | storage_la3_1@x.x.x.173 | running | 3e811d75 | 3e811d75 | 2014-09-11 14:39:23 +0000 G | gateway_la3_0@x.x.x.172 | running | 3e811d75 | 3e811d75 | 2014-09-11 14:39:25 +0000 G | gateway_la3_1@x.x.x.173 | running | 3e811d75 | 3e811d75 | 2014-09-11 14:39:24 +0000
Second
leofs-adm status [System config] System version : 1.1.2 Cluster Id : gv_1 DC Id : dc_gv Total replicas : 1 # of successes of R : 1 # of successes of W : 1 # of successes of D : 1 # of DC-awareness replicas : 0 ring size : 2^128 Current ring hash : 453e2a3c Prev ring hash : 453e2a3c [Multi DC replication settings] max # of joinable DCs : 3 # of replicas a DC : 1 [Node(s) state] -------+---------------------------------+--------------+----------------+----------------+---------------------------- type | node | state | current ring | prev ring | updated at -------+---------------------------------+--------------+----------------+----------------+---------------------------- S | storage_gv_0@x.x.x.16 | running | 453e2a3c | 453e2a3c | 2014-09-11 13:58:39 +0000 S | storage_gv_1@x.x.x.21 | running | 453e2a3c | 453e2a3c | 2014-09-11 13:58:39 +0000 G | gateway_gv_0@x.x.x.16 | running | 453e2a3c | 453e2a3c | 2014-09-11 13:58:41 +0000 G | gateway_gv_1@x.x.x.21 | running | 453e2a3c | 453e2a3c | 2014-09-11 13:58:40 +0000
Geo-replication status
leofs-adm cluster-status cluster id | dc id | status | # of storages | updated at -----------+------------+--------------+----------------+----------------------------- la3_1 | dc_la3 | running | 2 | 2014-09-11 14:41:49 +0000
In each cluster I uploaded 30 files, but not all files were synced over data centers
For example US DC
leofs-adm whereis leofs/gv-20 -------+-----------------------------------+--------------------------------------+------------+--------------+----------------+----------------+---------------------------- del? | node | ring address | size | checksum | # of chunks | clock | when -------+-----------------------------------+--------------------------------------+------------+--------------+----------------+----------------+---------------------------- | storage_la3_0@204.107.26.172 | | | | | | leofs-adm whereis leofs/gv-30 -------+-----------------------------------+--------------------------------------+------------+--------------+----------------+----------------+---------------------------- del? | node | ring address | size | checksum | # of chunks | clock | when -------+-----------------------------------+--------------------------------------+------------+--------------+----------------+----------------+---------------------------- | storage_la3_1@204.107.26.173 | | | | | | leofs-adm whereis leofs/gv-29 -------+-----------------------------------+--------------------------------------+------------+--------------+----------------+----------------+---------------------------- del? | node | ring address | size | checksum | # of chunks | clock | when -------+-----------------------------------+--------------------------------------+------------+--------------+----------------+----------------+---------------------------- | storage_la3_1@204.107.26.173 | 2230273c69dc67c1f366aed3cd816aa2 | 1048576K | 3a04f3144f | 69 | 502cbdc70f0bf | 2014-09-11 15:31:25 +0000
In EU DC I run leofs-adm recover-file for gv-20 file
leofs-adm recover-file leofs/gv-20
After this in US DC recovered gv-20 file and leofs/gv-30, but for leofs/gv-30 I did not run leofs-adm recover-file
leofs-adm whereis leofs/gv-20 -------+-----------------------------------+--------------------------------------+------------+--------------+----------------+----------------+---------------------------- del? | node | ring address | size | checksum | # of chunks | clock | when -------+-----------------------------------+--------------------------------------+------------+--------------+----------------+----------------+---------------------------- | storage_la3_0@204.107.26.172 | 2b671ea094596df2f31c9857ce4dd59a | 1048576K | 3a04f3144f | 69 | 502cbbc0b6311 | 2014-09-11 15:22:20 +0000 leofs-adm whereis leofs/gv-30 -------+-----------------------------------+--------------------------------------+------------+--------------+----------------+----------------+---------------------------- del? | node | ring address | size | checksum | # of chunks | clock | when -------+-----------------------------------+--------------------------------------+------------+--------------+----------------+----------------+---------------------------- | storage_la3_1@204.107.26.173 | f0d0eafa4b874eef0ee9a2b1929ee0ee | 1048576K | 3a04f3144f | 69 | 502cbe025337b | 2014-09-11 15:32:25 +0000
Such behavior isn't good.
Now, I have not yet tested how worked compaction. I think a couple of days will be more detailed information on compaction.
Thx, for your job.
Results of one test I run in US DC
s3cmd sync 20140823 s3://leofs/la3
and EU
s3cmd sync 20140567 s3://leofs/gv
20140823 and 20140823 is directories
After sync i did not see la3 and gv subdirectory in s3://leofs/
s3cmd la s3://leofs
I delete s3://leofs bucket and create new s3://geo
s3cmd ls 2014-09-11 18:00 s3://geo
leofs-adm get-buckets cluster id | bucket | owner | permissions | created at -------------+----------+--------+------------------+--------------------------- gv_1 | geo | pix111 | Me(full_control) | 2014-09-11 18:00:21 +0000
After this I run in all dc compact-start
leofs-adm compact-start storage_la3_0@x.x.x.172 all leofs-adm compact-start storage_la3_1@x.x.x.173 all
leofs-adm compact-start storage_gv_0@x.x.x.16 all leofs-adm compact-start storage_gv_1@x.x.x.21 all
Results:
leofs-adm du storage_gv_0@x.x.x.16 active number of objects: 3086 total number of objects: 18263 active size of objects: 5126595488 total size of objects: 5291487675 ratio of active size: 96.88% last compaction start: 2014-09-11 18:02:26 +0000 last compaction end: 2014-09-11 18:02:38 +0000 leofs-adm du storage_gv_1@x.x.x.21 active number of objects: 3930 total number of objects: 21252 active size of objects: 5737348387 total size of objects: 6032152176 ratio of active size: 95.11% last compaction start: 2014-09-11 18:02:03 +0000 last compaction end: 2014-09-11 18:18:38 +0000
leofs-adm du storage_la3_0@x.x.x.172;leofs-adm du storage_la3_1@x.x.x.173 active number of objects: 5260 total number of objects: 5362 active size of objects: 5423932492 total size of objects: 5476377563 ratio of active size: 99.04% last compaction start: 2014-09-11 18:01:28 +0000 last compaction end: 2014-09-11 18:01:35 +0000 active number of objects: 6650 total number of objects: 6854 active size of objects: 6751098313 total size of objects: 6820919251 ratio of active size: 98.98% last compaction start: 2014-09-11 18:01:36 +0000 last compaction end: 2014-09-11 18:01:45 +0000
And after compaction, data are not deleted
storage_la3_0 /dev/sdb1 58T 19G 58T 1% /usr/local/leofs storage_la3_1 /dev/sdb1 58T 8.0G 58T 1% /usr/local/leofs storage_gv_0 /dev/sdb1 58T 51G 58T 1% /usr/local/leofs storage_gv_1 /dev/sdb1 58T 7.9G 58T 1% /usr/local/leofs
Total uploaded about 50GB on both datacenter.
Total information on the DC's - shown in the previous post
Regarding the mdc-replication, this function adopts asynchronous replication due to restriction of network bandwidth. It means fixing consistency of object is necessary much time.
I've shared our test results: Benchmark LeoFS v1.1.2 with MDC-Replication
Ok, thx
Hello, have same issue with joining to the cluster in multi dc configuration:
ubuntu@leofs1:/usr/local/leofs/1.2.7$ leofs-adm join-cluster manager_0@10.77.254.44:13075 manager_0@10.77.254.44:13075
[ERROR] Could not connect
ubuntu@leofs1:/usr/local/leofs/1.2.7$ leofs-adm status
[System Confiuration]
---------------------------------+----------
Item | Value
---------------------------------+----------
Basic/Consistency level
---------------------------------+----------
system version | 1.2.7
cluster Id | clid
DC Id | n1
Total replicas | 1
number of successes of R | 1
number of successes of W | 1
number of successes of D | 1
number of DC-awareness replicas | 0
ring size | 2^128
---------------------------------+----------
Multi DC replication settings
---------------------------------+----------
max number of joinable DCs | 3
number of replicas a DC | 1
---------------------------------+----------
Manager RING hash
---------------------------------+----------
current ring-hash | 28e8d0fe
previous ring-hash | 28e8d0fe
---------------------------------+----------
[State of Node(s)]
-------+-----------------------------+--------------+----------------+----------------+----------------------------
type | node | state | current ring | prev ring | updated at
-------+-----------------------------+--------------+----------------+----------------+----------------------------
S | storage_0@10.54.254.24 | running | 28e8d0fe | 28e8d0fe | 2015-03-11 15:23:57 +0600
G | gateway_0@10.54.254.24 | running | 28e8d0fe | 28e8d0fe | 2015-03-11 16:16:12 +0600
Where I can see the logs about this problem? Maybe debug log or erlang dump.
It seems communication failure between clusters happened. I've checked this issue on my local node as follows:
##
## Building two clusters in the local node
##
$ sh build_mdcr_env.sh
##
## Cluster-1
##
$ ./leofs-adm start
Generating RING...
Generated RING
OK 33% - storage_2@127.0.0.1
OK 67% - storage_1@127.0.0.1
OK 100% - storage_0@127.0.0.1
OK
$ ./leofs-adm status
[System Confiuration]
---------------------------------+----------
Item | Value
---------------------------------+----------
Basic/Consistency level
---------------------------------+----------
system version | 1.2.7
cluster Id | leofs_1
DC Id | dc_1
Total replicas | 2
number of successes of R | 1
number of successes of W | 1
number of successes of D | 1
number of DC-awareness replicas | 0
ring size | 2^128
---------------------------------+----------
Multi DC replication settings
---------------------------------+----------
max number of joinable DCs | 2
number of replicas a DC | 1
---------------------------------+----------
Manager RING hash
---------------------------------+----------
current ring-hash | d5d667a6
previous ring-hash | d5d667a6
---------------------------------+----------
[State of Node(s)]
-------+--------------------------+--------------+----------------+----------------+----------------------------
type | node | state | current ring | prev ring | updated at
-------+--------------------------+--------------+----------------+----------------+----------------------------
S | storage_0@127.0.0.1 | running | d5d667a6 | d5d667a6 | 2015-03-13 09:58:46 +0900
S | storage_1@127.0.0.1 | running | d5d667a6 | d5d667a6 | 2015-03-13 09:58:46 +0900
S | storage_2@127.0.0.1 | running | d5d667a6 | d5d667a6 | 2015-03-13 09:58:46 +0900
G | gateway_0@127.0.0.1 | running | d5d667a6 | d5d667a6 | 2015-03-13 09:58:47 +0900
-------+--------------------------+--------------+----------------+----------------+----------------------------
##
## Cluster-2
##
$ ./leofs-adm -p 10110 start
Generating RING...
Generated RING
OK 33% - storage_11@127.0.0.1
OK 67% - storage_12@127.0.0.1
OK 100% - storage_10@127.0.0.1
OK
$ ./leofs-adm -p 10110 status
[System Confiuration]
---------------------------------+----------
Item | Value
---------------------------------+----------
Basic/Consistency level
---------------------------------+----------
system version | 1.2.7
cluster Id | leofs_2
DC Id | dc_2
Total replicas | 2
number of successes of R | 1
number of successes of W | 1
number of successes of D | 1
number of DC-awareness replicas | 0
ring size | 2^128
---------------------------------+----------
Multi DC replication settings
---------------------------------+----------
max number of joinable DCs | 2
number of replicas a DC | 1
---------------------------------+----------
Manager RING hash
---------------------------------+----------
current ring-hash | 31686cec
previous ring-hash | 31686cec
---------------------------------+----------
[State of Node(s)]
-------+---------------------------+--------------+----------------+----------------+----------------------------
type | node | state | current ring | prev ring | updated at
-------+---------------------------+--------------+----------------+----------------+----------------------------
S | storage_10@127.0.0.1 | running | 31686cec | 31686cec | 2015-03-13 09:59:32 +0900
S | storage_11@127.0.0.1 | running | 31686cec | 31686cec | 2015-03-13 09:59:32 +0900
S | storage_12@127.0.0.1 | running | 31686cec | 31686cec | 2015-03-13 09:59:32 +0900
G | gateway_10@127.0.0.1 | running | 31686cec | 31686cec | 2015-03-13 09:59:33 +0900
-------+---------------------------+--------------+----------------+----------------+----------------------------
### Join a Cluster (Between
$ ./leofs-adm join-cluster manager_10@127.0.0.1:13095 manager_11@127.0.0.1:13096
OK
$ ./leofs-adm cluster-status
cluster id | dc id | status | # of storages | updated at
-----------+------------+--------------+----------------+-----------------------------
leofs_2 | dc_2 | running | 3 | 2015-03-13 10:02:59 +0900
So I'll check output error log of communication failure between clusters. Thank you for your report.
I found the problem only with tcpdump:
16:39:33.835449 IP 10.77.254.44.13075 > 10.54.254.24.37173: Flags [P.], seq 1:80, ack 179, win 285, options [nop,nop,TS val 47608071 ecr 48785927], length 79
E.....@.@.R.
M.,
6..3..5,d..}........>.....
..q...j.*T...E
...&
M
.k."Already has a same neme of cluster
...
M
.d..error
I changed system.cluster_id
, restarted all services on second cluster.
With leofs-adm I see new cluster_id name, but have the same error. How it is possible?
@UnderGreen I'll check this issue, again. Thank you for your report.
In order to solve the duplicate cluster ID, you need to update the 2nd cluster's id at leo_manager_0.conf and remove under ["work/mnesia/127.0.0.1/*"]() of the 2nd cluster's manager manager and slave, then restart both the master and the slave node as well as the storage-node(s) and the gateway-node(s).
I tried to create multi-dc replication from scratch, but have this error on leofs_n1 cluster when I tried to add third DC:
root@leofs1.n1:/usr/local/leofs/1.2.7# leofs-adm join-cluster manager_0@10.100.254.19 manager_1@10.100.254.19
[ERROR] Over max number of clusters
leofs_n1 status:
[System Confiuration]
---------------------------------+----------
Item | Value
---------------------------------+----------
Basic/Consistency level
---------------------------------+----------
system version | 1.2.7
cluster Id | leofs_n1
DC Id | n1
Total replicas | 1
number of successes of R | 1
number of successes of W | 1
number of successes of D | 1
number of DC-awareness replicas | 0
ring size | 2^128
---------------------------------+----------
Multi DC replication settings
---------------------------------+----------
max number of joinable DCs | 5
number of replicas a DC | 1
---------------------------------+----------
Manager RING hash
---------------------------------+----------
current ring-hash | 28e8d0fe
previous ring-hash | 28e8d0fe
---------------------------------+----------
[State of Node(s)]
-------+-----------------------------+--------------+----------------+----------------+----------------------------
type | node | state | current ring | prev ring | updated at
-------+-----------------------------+--------------+----------------+----------------+----------------------------
S | storage_0@10.54.254.24 | running | 28e8d0fe | 28e8d0fe | 2015-03-17 15:00:26 +0600
-------+-----------------------------+--------------+----------------+----------------+----------------------------
leofs_n1 cluster-status:
cluster id | dc id | status | # of storages | updated at
-----------+------------+--------------+----------------+-----------------------------
leofs_m1 | m1 | running | 1 | 2015-03-17 15:06:06 +0600
Actually, we does not support communication of three and over clusters, yet. We're planning to implement that with LeoFS v1.4.
Hi, I have two datacenter and on datacenter by on two node.
On each datacenter Total replicas : 1 (i need a replica on datacenter)
first dc leofs-adm status [System config] System version : 1.1.1 Cluster Id : US DC Id : dc_us Total replicas : 1
of successes of R : 1
of DC-awareness replicas : 0
[Multi DC replication settings] max # of joinable DCs : 2
of replicas a DC : 1
[Node(s) state] -------+-----------------------------+--------------+----------------+----------------+---------------------------- type | node | state | current ring | prev ring | updated at
-------+-----------------------------+--------------+----------------+----------------+---------------------------- S | storage_0@10.0.10.3 | running | 2568fc74 | 2568fc74 | 2014-09-01 15:39:05 +0000 S | storage_1@10.0.10.4 | running | 2568fc74 | 2568fc74 | 2014-09-01 15:47:27 +0000 G | gateway_0@10.0.10.1 | running | 2568fc74 | 2568fc74 | 2014-09-01 13:49:19 +0000 G | gateway_1@10.0.10.12 | running | 2568fc74 | 2568fc74 | 2014-09-01 13:44:09 +0000
second dc
leofs-adm status [System config] System version : 1.1.1 Cluster Id : EU DC Id : dc_eu Total replicas : 1
of successes of R : 1
of DC-awareness replicas : 0
[Multi DC replication settings] max # of joinable DCs : 2
of replicas a DC : 1
[Node(s) state] -------+---------------------------+--------------+----------------+----------------+---------------------------- type | node | state | current ring | prev ring | updated at
-------+---------------------------+--------------+----------------+----------------+---------------------------- S | storage_0@10.0.0.3 | running | 1926f3fe | 1926f3fe | 2014-09-01 13:34:46 +0000 S | storage_1@10.0.0.4 | running | 1926f3fe | 1926f3fe | 2014-09-01 13:34:46 +0000 G | gateway_0@10.0.0.1 | running | 1926f3fe | 1926f3fe | 2014-09-01 13:49:15 +0000 G | gateway_1@10.0.0.2 | running | 1926f3fe | 1926f3fe | 2014-09-01 13:44:50 +0000
I am using s3 for download/upload data
I uploaded on EU dc 110 files different size (from a few tens of megabytes to a few hundred) In US dc replicated 49 files and replication stopped. How I can restart replication process and check him?
And the second question: can packet losses between locations affect on replication or stop it? if replication stopped how to start it back?