leo-project / leofs

The LeoFS Storage System
https://leo-project.net/leofs/
Apache License 2.0
1.55k stars 155 forks source link

Automatic duplication of files to alive servers #1188

Open w1nns opened 5 years ago

w1nns commented 5 years ago

Hello! I deployed a cluster of leofs in docker. Master Manager, Slave Manager, three storages and gateway. Replication is configured to two storages. I can't understand if there is a mechanism for auto-duplicating a file to third alive storage, if one of the two storages that contains the file fell.

I found the command leofs-adm recover-file, but error is displayed:

[ERROR] Could not recover

Error logs are empty.

Regards

yosukehara commented 5 years ago

Let me know your LeoManager's configuration file which is leo_manager_0/etc/leo_manager.conf and the result of $ leofs-adm status to solve the problem.

mocchira commented 5 years ago

@Kirsun25 If I understand your question correctly, there are no mechanisms for auto-duplicating a file to third alive storage (I think this is the feature called Hinted Hand Off) on the current latest LeoFS. so recover-file never works without bringing the failed node back to the cluster.

As we noted on https://github.com/leo-project/leofs#version-2, Hinted Hand Off (The feature you'd expect) will be implemented on the version 2.1.

Let me know if I misunderstand your question.

w1nns commented 5 years ago

Thank you for the quick reply!

root@917285ece53a:/# cat /usr/local/leofs/1.4.3/leo_manager_0/etc/leo_manager.conf 
##======================================================================
## LeoFS - Manager Configuration (MASTER)
##
## See: http://leo-project.net/leofs/docs/configuration/configuration_1.html
##
## Additional configuration files from leo_manager.d/*.conf (if exist) are
## processed after this file and can be used to override these settings.
##======================================================================
## --------------------------------------------------------------------
## SASL
## --------------------------------------------------------------------
## See: http://www.erlang.org/doc/man/sasl_app.html
##
## The following configuration parameters are defined for
## the SASL application. See app(4) for more information
## about configuration parameters

## SASL error log path
## sasl.sasl_error_log = ./log/sasl/sasl-error.log

## Restricts the error logging performed by the specified sasl_error_logger
## to error reports, progress reports, or both.
## errlog_type = [error | progress | all]
## sasl.errlog_type = error

## Specifies in which directory the files are stored.
## If this parameter is undefined or false, the error_logger_mf_h is not installed.
## sasl.error_logger_mf_dir = ./log/sasl

## Specifies how large each individual file can be.
## If this parameter is undefined, the error_logger_mf_h is not installed.
## sasl.error_logger_mf_maxbytes = 10485760

## Specifies how many files are used.
## If this parameter is undefined, the error_logger_mf_h is not installed.
## sasl.error_logger_mf_maxfiles = 5

## --------------------------------------------------------------------
## MANAGER
## --------------------------------------------------------------------
## Partner of manager's alias
manager.partner = manager_1@172.21.0.8

## Manager-console accepatable ip address
console.bind_address = localhost

## Manager-console accepatable port number
console.port.cui  = 10010
console.port.json = 10020

## Manager-console's number of acceptors
console.acceptors.cui = 3
console.acceptors.json = 16

## # of histories to display at once
console.histories.num_of_display = 200

## --------------------------------------------------------------------
## MANAGER - System
##     * Only set its configurations to **Manager-master**
## --------------------------------------------------------------------
## DC Id
system.dc_id = dc_1

## Cluster Id
system.cluster_id = leofs_1

## --------------------------------------------------------------------
## MANAGER - Consistency Level
##     * Only set its configurations to **Manager-master**
##     * See: http://leo-project.net/leofs/docs/configuration/configuration_1.html
## --------------------------------------------------------------------
## A number of replicas
consistency.num_of_replicas = 2

## A number of replicas needed for a successful WRITE operation
consistency.write = 1

## A number of replicas needed for a successful READ operation
consistency.read = 1

## A number of replicas needed for a successful DELETE operation
consistency.delete = 1

## A number of rack-aware replicas
consistency.rack_aware_replicas = 0

## --------------------------------------------------------------------
## MANAGER - Multi DataCenter Settings
## --------------------------------------------------------------------
## A number of replication targets
## mdc_replication.max_targets = 2

## A number of replicas per a datacenter
## [note] A local LeoFS sends a stacked object which contains an items of a replication method:
##          - [L1_N] A number of replicas
##          - [L1_W] A number of replicas needed for a successful WRITE-operation
##          - [L1_R] A number of replicas needed for a successful READ-operation
##          - [L1_D] A number of replicas needed for a successful DELETE-operation
##       A remote cluster of a LeoFS system which receives this cluster's objects,
##       and then replicates them, which adhere to a replication method of each object
## mdc_replication.num_of_replicas_a_dc = 1

## MDC replication / A number of replicas needed for a successful WRITE-operation
## mdc_replication.consistency.write = 1

## MDC replication / A number of replicas needed for a successful READ-operation
## mdc_replication.consistency.read = 1

## MDC replication / A number of replicas needed for a successful DELETE-operation
## mdc_replication.consistency.delete = 1

## --------------------------------------------------------------------
## MANAGER - Mnesia
##     * Store the info storage-cluster and the info of gateway(s)
##     * Store the RING and the command histories
## --------------------------------------------------------------------
## Mnesia dir
mnesia.dir = ./work/mnesia/127.0.0.1

## The write threshold for transaction log dumps
## as the number of writes to the transaction log
mnesia.dump_log_write_threshold = 50000

## Controls how often disc_copies tables are dumped from memory
mnesia.dc_dump_limit = 40

## --------------------------------------------------------------------
## MANAGER - Log
## --------------------------------------------------------------------
## Log level: [0:debug, 1:info, 2:warn, 3:error]
## log.log_level = 1

## Output log file(s) - Erlang's log
## log.erlang = ./log/erlang

## Output log file(s) - app
## log.app = ./log/app

## Output log file(s) - members of storage-cluster
## log.member_dir = ./log/ring

## Output log file(s) - ring
## log.ring_dir = ./log/ring

## --------------------------------------------------------------------
## MANAGER - Other Directories
## --------------------------------------------------------------------
## Directory of queue for monitoring "RING"
## queue_dir = ./work/queue

## Directory of SNMP agent configuration
## snmp_agent = ./snmp/snmpa_manager_0/LEO-MANAGER

## --------------------------------------------------------------------
## RPC
## --------------------------------------------------------------------
## RPC-Server's acceptors
rpc.server.acceptors = 16

## RPC-Server's listening port number
rpc.server.listen_port = 13075

## RPC-Server's listening timeout
rpc.server.listen_timeout = 5000

## RPC-Client's size of connection pool
rpc.client.connection_pool_size = 16

## RPC-Client's size of connection buffer
rpc.client.connection_buffer_size = 16

## --------------------------------------------------------------------
## Other Libs
## --------------------------------------------------------------------
## Enable profiler - leo_backend_db
## leo_backend_db.profile = false

## Enable profiler - leo_logger
## leo_logger.profile = false

## Enable profiler - leo_mq
## leo_mq.profile = false

## Enable profiler - leo_redundant_manager
## leo_redundant_manager.profile = false

## Enable profiler - leo_statistics
## leo_statistics.profile = false

##======================================================================
## For vm.args
##======================================================================
## Name of the LeoFS's manager node
nodename = manager_0@172.21.0.7

## Cookie for distributed node communication.  All nodes in the same cluster
## should use the same cookie or they will not be able to communicate.
distributed_cookie = 401321b4

## Enable kernel poll
erlang.kernel_poll = true

## Number of async threads
erlang.asyc_threads = 32

## Increase number of concurrent ports/sockets
erlang.max_ports = 64000

## Set the location of crash dumps
erlang.crash_dump = ./log/erl_crash.dump

## Raise the ETS table limit
erlang.max_ets_tables = 256000

## Enable SMP
erlang.smp = enable

## Raise the default erlang process limit
process_limit = 1048576

## Path of SNMP-agent configuration
## snmp_conf = ./snmp/snmpa_manager_0/leo_manager_snmp
root@917285ece53a:/# leofs-adm status
/usr/local/bin/leofs-adm: line 71: lsb_release: command not found
/usr/local/bin/leofs-adm: line 72: lsb_release: command not found
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value    
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.4.3
                        cluster Id | leofs_1
                             DC Id | dc_1
                    Total replicas | 2
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 0
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
 [mdcr] max number of joinable DCs | 2
 [mdcr] total replicas per a DC    | 1
 [mdcr] number of successes of R   | 1
 [mdcr] number of successes of W   | 1
 [mdcr] number of successes of D   | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | 6c4473de
                previous ring-hash | 6c4473de
-----------------------------------+----------

 [State of Node(s)]
-------+----------------------------+--------------+---------+----------------+----------------+----------------------------
 type  |            node            |    state     | rack id |  current ring  |   prev ring    |          updated at         
-------+----------------------------+--------------+---------+----------------+----------------+----------------------------
  S    | storage_0@172.21.0.9       | stop         |         |                |                | 2019-07-10 08:42:38 +0000
  S    | storage_1@172.21.0.10      | running      |         | 6c4473de       | 6c4473de       | 2019-07-05 14:17:48 +0000
  S    | storage_2@172.21.0.12      | running      |         | 6c4473de       | 6c4473de       | 2019-07-05 14:17:53 +0000
  G    | gateway_0@172.21.0.11      | running      |         | 6c4473de       | 6c4473de       | 2019-07-10 08:16:35 +0000
-------+----------------------------+--------------+---------+----------------+----------------+----------------------------
root@917285ece53a:/# leofs-adm whereis mybucket/adaptive2.smil
/usr/local/bin/leofs-adm: line 71: lsb_release: command not found
/usr/local/bin/leofs-adm: line 72: lsb_release: command not found
-------+----------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |            node            |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when            
-------+----------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
       | storage_0@172.21.0.9       |                                      |            |              |                |                |                | 
       | storage_1@172.21.0.10      | cfbd634b9e7d0fc0c65a2b80d027e435     |       656B |   e4eb5205af | false          |              0 | 58ca1291adedd  | 2019-07-01 16:31:30 +0000

[Failure Nodes]
storage_0@172.21.0.9:unavailable

Perhaps I incorrectly formulated my question. Actually, I'm interested in the recovery mechanism. As I said above, there are three repositories. Two of them contain a file. One of the storages, which contain the file fell and suppose it does not recover. How can I recover the number of duplicates(to two) of a file on servers?

yosukehara commented 5 years ago

Sorry for the late reply. Read the following LeoFS' documentation - Cluster Settings / Consistency Level. The document seems to contain what you want to know.