vernemq / docker-vernemq

VerneMQ Docker image - Starts the VerneMQ MQTT broker and listens on 1883 and 8080 (for websockets).
https://vernemq.com
Apache License 2.0
177 stars 230 forks source link

Internal database corrupted after first start of Docker cluster #161

Open drasko opened 5 years ago

drasko commented 5 years ago

Environment

Used following directives in docker-compose.yaml for clustering:

mqtt-adapter-1:
    image: mainflux/mqtt-verne:latest
    container_name: mainflux-mqtt-1
    depends_on:
      - things
      - nats
      - es-redis
    restart: on-failure
    environment:
      MF_MQTT_ADAPTER_LOG_LEVEL: ${MF_MQTT_ADAPTER_LOG_LEVEL}
      MF_MQTT_INSTANCE_ID: mqtt-adapter-1
      MF_MQTT_ADAPTER_WS_PORT: ${MF_MQTT_ADAPTER_WS_PORT}
      MF_MQTT_ADAPTER_ES_URL: tcp://es-redis:${MF_REDIS_TCP_PORT}
      MF_NATS_URL: ${MF_NATS_URL}
      MF_THINGS_AUTH_GRPC_URL: http://things:${MF_THINGS_AUTH_GRPC_PORT}
      DOCKER_VERNEMQ_PLUGINS__VMQ_PASSWD: "off"
      DOCKER_VERNEMQ_PLUGINS__VMQ_ACL: "off"
      DOCKER_VERNEMQ_PLUGINS__MFX_AUTH: "on"
      DOCKER_VERNEMQ_PLUGINS__MFX_AUTH__PATH: /mainflux/_build/default
      DOCKER_VERNEMQ_LOG__CONSOLE__LEVEL: debug
      MF_MQTT_VERNEMQ_GRPC_POOL_SIZE: 1000
    ports:
      - 18831:1883
      - 8881:8080
      - 7777:8888 # VerneMQ dasboard
    networks:
      - mainflux-base-net

  mqtt-adapter-2:
    image: mainflux/mqtt-verne:latest
    container_name: mainflux-mqtt-2
    depends_on:
      - things
      - nats
      - es-redis
    restart: on-failure
    environment:
      MF_MQTT_ADAPTER_LOG_LEVEL: ${MF_MQTT_ADAPTER_LOG_LEVEL}
      MF_MQTT_INSTANCE_ID: mqtt-adapter-2
      MF_MQTT_ADAPTER_WS_PORT: 8080
      MF_MQTT_ADAPTER_ES_URL: tcp://es-redis:${MF_REDIS_TCP_PORT}
      MF_NATS_URL: ${MF_NATS_URL}
      MF_THINGS_AUTH_GRPC_URL: http://things:${MF_THINGS_AUTH_GRPC_PORT}
      DOCKER_VERNEMQ_PLUGINS__VMQ_PASSWD: "off"
      DOCKER_VERNEMQ_PLUGINS__VMQ_ACL: "off"
      DOCKER_VERNEMQ_PLUGINS__MFX_AUTH: "on"
      DOCKER_VERNEMQ_PLUGINS__MFX_AUTH__PATH: /mainflux/_build/default
      DOCKER_VERNEMQ_LOG__CONSOLE__LEVEL: debug
      MF_MQTT_VERNEMQ_GRPC_POOL_SIZE: 1000
      DOCKER_VERNEMQ_COMPOSE: 1
      DOCKER_VERNEMQ_DISCOVERY_NODE: mqtt-adapter-1
    ports:
      - 18832:1883
      - 8882:8080
      - 7778:8888 # VerneMQ dasboard
    depends_on:
      - mqtt-adapter-1
    networks:
      - mainflux-base-net

  mqtt-adapter-3:
    image: mainflux/mqtt-verne:latest
    container_name: mainflux-mqtt-3
    depends_on:
      - things
      - nats
      - es-redis
    restart: on-failure
    environment:
      MF_MQTT_ADAPTER_LOG_LEVEL: ${MF_MQTT_ADAPTER_LOG_LEVEL}
      MF_MQTT_INSTANCE_ID: mqtt-adapter-3
      MF_MQTT_ADAPTER_PORT: 18833
      MF_MQTT_ADAPTER_WS_PORT: 8882
      MF_MQTT_ADAPTER_ES_URL: tcp://es-redis:${MF_REDIS_TCP_PORT}
      MF_NATS_URL: ${MF_NATS_URL}
      MF_THINGS_AUTH_GRPC_URL: http://things:${MF_THINGS_AUTH_GRPC_PORT}
      DOCKER_VERNEMQ_PLUGINS__VMQ_PASSWD: "off"
      DOCKER_VERNEMQ_PLUGINS__VMQ_ACL: "off"
      DOCKER_VERNEMQ_PLUGINS__MFX_AUTH: "on"
      DOCKER_VERNEMQ_PLUGINS__MFX_AUTH__PATH: /mainflux/_build/default
      DOCKER_VERNEMQ_LOG__CONSOLE__LEVEL: debug
      MF_MQTT_VERNEMQ_GRPC_POOL_SIZE: 1000
      DOCKER_VERNEMQ_COMPOSE: 1
      DOCKER_VERNEMQ_DISCOVERY_NODE: mqtt-adapter-1
    ports:
      - 18833:1883
      - 8883:8080
      - 7779:8888 # VerneMQ dasboard
    depends_on:
      - mqtt-adapter-1
    networks:
      - mainflux-base-net

Expected behavior

Cluster to start normally

Actual behaviour

On restarting docker-compose, master node in the cluster fails:

mainflux-mqtt-1   | 18:47:50.568 [info] Datadir ./data/meta/meta/10 options for LevelDB: [{open,[{block_cache_threshold,33554432},{block_restart_interval,16},{block_size_steps,16},{compression,true},{create_if_missing,true},{delete_threshold,1000},{eleveldb_threads,71},{fadvise_willneed,false},{limited_developer_mem,false},{sst_block_size,4096},{tiered_slow_level,0},{total_leveldb_mem_percent,6},{use_bloomfilter,true},{write_buffer_size,47182363}]},{read,[{verify_checksums,true}]},{write,[{sync,false}]},{fold,[{verify_checksums,true},{fill_cache,false}]}]
mainflux-mqtt-1   | 18:47:50.592 [info] Datadir ./data/meta/meta/11 options for LevelDB: [{open,[{block_cache_threshold,33554432},{block_restart_interval,16},{block_size_steps,16},{compression,true},{create_if_missing,true},{delete_threshold,1000},{eleveldb_threads,71},{fadvise_willneed,false},{limited_developer_mem,false},{sst_block_size,4096},{tiered_slow_level,0},{total_leveldb_mem_percent,6},{use_bloomfilter,true},{write_buffer_size,48522751}]},{read,[{verify_checksums,true}]},{write,[{sync,false}]},{fold,[{verify_checksums,true},{fill_cache,false}]}]
mainflux-mqtt-1   | 18:47:50.619 [error] Supervisor plumtree_sup had child plumtree_broadcast started with plumtree_broadcast:start_link() at undefined exit with reason {'EXIT',{function_clause,[{orddict,fetch,['VerneMQ@172.23.0.12',[{'VerneMQ@172.23.0.15',['VerneMQ@172.23.0.16']},{'VerneMQ@172.23.0.16',['VerneMQ@172.23.0.17']},{'VerneMQ@172.23.0.17',['VerneMQ@172.23.0.15']}]],[{file,"orddict.erl"},{line,80}]},{plumtree_broadcast,init_peers,1,[{file,"/vernemq-build/_build/default/lib/plumtree/src/plumtree_broadcast.erl"},{line,754}]},{plumtree_broadcast,start_link,0,[{file,"/vernemq-build/_build/default/lib/plumtree/src/plumtree_broadcast.erl"},{line,150}]},...]}} in context start_error
mainflux-mqtt-1   | 18:47:50.621 [error] CRASH REPORT Process <0.193.0> with 0 neighbours exited with reason: {{error,{shutdown,{failed_to_start_child,plumtree_broadcast,{'EXIT',{function_clause,[{orddict,fetch,['VerneMQ@172.23.0.12',[{'VerneMQ@172.23.0.15',['VerneMQ@172.23.0.16']},{'VerneMQ@172.23.0.16',['VerneMQ@172.23.0.17']},{'VerneMQ@172.23.0.17',['VerneMQ@172.23.0.15']}]],[{file,"orddict.erl"},{line,80}]},{plumtree_broadcast,init_peers,1,[{file,"/vernemq-build/_build/default/lib/plumtree/src/plumtree_broadcast.erl"},{line,754}]},{plumtree_broadcast,start_link,0,[{file,"/vernemq-build/_build/..."},...]},...]}}}}},...} in application_master:init/4 line 138
mainflux-mqtt-1   | 18:47:50.621 [info] Application plumtree exited with reason: {{error,{shutdown,{failed_to_start_child,plumtree_broadcast,{'EXIT',{function_clause,[{orddict,fetch,['VerneMQ@172.23.0.12',[{'VerneMQ@172.23.0.15',['VerneMQ@172.23.0.16']},{'VerneMQ@172.23.0.16',['VerneMQ@172.23.0.17']},{'VerneMQ@172.23.0.17',['VerneMQ@172.23.0.15']}]],[{file,"orddict.erl"},{line,80}]},{plumtree_broadcast,init_peers,1,[{file,"/vernemq-build/_build/default/lib/plumtree/src/plumtree_broadcast.erl"},{line,754}]},{plumtree_broadcast,start_link,0,[{file,"/vernemq-build/_build/..."},...]},...]}}}}},...}
mainflux-mqtt-1   | 18:47:50.621 [debug] loading modules: [vmq_plumtree,vmq_plumtree_app,vmq_plumtree_sup]
mainflux-mqtt-1   | 18:47:50.621 [info] Application sext exited with reason: stopped
mainflux-mqtt-1   | 18:47:50.621 [info] Application riak_dt exited with reason: stopped
mainflux-mqtt-1   | 18:47:50.627 [debug] Lager installed handler lager_backend_throttle into lager_event
mainflux-mqtt-1   | 18:47:50.633 [info] Try to start vmq_plumtree: ok
mainflux-mqtt-1   | [os_mon] memory supervisor port (memsup): Erlang has closed
mainflux-mqtt-1   | 18:47:50.637 [error] CRASH REPORT Process <0.215.0> with 0 neighbours crashed with reason: bad argument in call to ets:lookup(cluster_state, cluster_state) in plumtree_peer_service_manager:get_local_state/0 line 43
mainflux-mqtt-1   | [os_mon] cpu supervisor port (cpu_sup): Erlang has closed
mainflux-mqtt-1   | 18:47:50.637 [error] CRASH REPORT Process <0.188.0> with 0 neighbours exited with reason: bad argument in call to ets:lookup(cluster_state, cluster_state) in plumtree_peer_service_manager:get_local_state/0 line 43 in application_master:init/4 line 138
mainflux-mqtt-1   | 18:47:50.637 [info] Application vmq_server exited with reason: bad argument in call to ets:lookup(cluster_state, cluster_state) in plumtree_peer_service_manager:get_local_state/0 line 43
mainflux-mqtt-1   | 18:47:50.640 [info] alarm_handler: {clear,system_memory_high_watermark}

Master node container must be deleted in order for composition to work again:

docker rm mainflux-mqtt-1
larshesel commented 5 years ago

Transferred this to the vernemq-docker repo. Looking at the stack trace my guess is that the node comes with a new node name but with the old metadata - so perhaps this is an issue with the docker-compose 'statefulness'? Note, I have never worked with docker-compose, so I have no idea how it works.

Though likely not a solution to the root cause of this, perhaps you should try the swc metadata backend - the plumtree one will be deprecated and removed for VerneMQ 2.0.