open-webrtc-toolkit / owt-server

General server (streaming/conference/transcoding/anayltics) for OWT. (A.k.a. MediaServer)
https://01.org/open-webrtc-toolkit
Apache License 2.0
1.13k stars 377 forks source link

owt-cluster queue deleted if !!Double master!! occur on cluster_manager service #530

Open dboyzhong opened 4 years ago

dboyzhong commented 4 years ago

summary:

cluster_manager stop working and whole cluster stopped service.

environment:

CentOS7.6 mcu server: 4.3.x

steps:

  1. two cluster_managers: A for master, B for slave
  2. network flash,B became master,
  3. network recover, A & B both master, B life_time is shorter, so B process exit.
  4. rabbitmq queue "owt-cluster" deleted. so A can not received any message from queue "owt-cluster", all other service lost connection with cluster_manager

    reason:

    amqp_client.js:157

    handler.close = function() {
        request_q && request_q.destroy();
        request_q = undefined;
        exc && exc.destroy(true);
        exc = undefined;
    };

    the handler.close will called if cluster_manager process exited by calling "process.exit(1);" and request_q.destroy() will delete the queue even if the other cluster_manager still connected on it.

    manually reproduce:

  5. start master cluster_manager A.
  6. modify slave cluster_manager B file: clusterManager.js at last lines for simulate

    exports.run = function (topicChannel, clusterName, id, spec) {
    var manager = new ClusterManager(clusterName, id, spec);
    
    runAsCandidate(topicChannel, manager);  //just using runAsMaster(topicChannel, manager); instead
    };
  7. start cluster_manager B.
  8. cluster_manager B detected "Double master" and exit.
  9. cluster_manager A stopped service.
dboyzhong commented 4 years ago

suggestion: request_q.destroy(); using option: request_q.destroy({ifUnused : true});

starwarfan commented 4 years ago

Hi, thanks for your suggestion and will you create a pull request for this issue?

dboyzhong commented 4 years ago

@starwarfan Yes, i created a pull request for this issue, thanks.