nuagenetworks / nuage-metroae

Nuage Networks Metro Automation Engine
http://devops.nuagenetworks.net
Apache License 2.0
44 stars 17 forks source link

vsd-cluster-failover assumes primary ElasticSearch cluster is active. #1219

Closed steve-butler closed 4 years ago

steve-butler commented 5 years ago

src/roles/vsd-cluster-failover/tasks/main.yml hard codes the active elasticsearch cluster as hostvars[groups['vstats'][0]]['hostname'],hostvars[groups['vstats'][1]]['hostname'],hostvars[groups['vstats'][2]]['hostname'] If vsd-cluster-failover runs while standby ES is active, stats could get lost by ES backup/restore.

Manual workaround: Change the order of servers in vstats deployment file.

Partial Code Solution: Get active elasticsearch hosts list from primary VSD config file. Use output in vsd-switch-replication-cluster-role command. Something like this:

- name: Get ElasticSearch node list from VSD stats.conf
  shell: grep ^statscollector.elasticsearch.host= /opt/vsd/stats_collector/conf/stats.conf | cut -c 35-
  register: activeEsNodes
  changed_when: false
  delegate_to: "{{ item }}"
  with_items: "{{ groups['primary_vsds'] }}"
  run_once: true

- name: Switch replication role of standby VSD cluster when using clustered VSTATs
  command: "/opt/vsd/bin/vsd-switch-replication-cluster-role --role active -e {{activeEsNodes.stdout}}"
  when:
    - groups['vstats'] is defined
  delegate_to: "{{ item }}"
  with_items: "{{ groups['standby_vsds'] }}"

This solution only work if primary cluster is reachable.

ghost commented 5 years ago

Thank you, Steve! The manual work-around is what we had in mind. We were not aware of the stats_collector info on the VSD. We will create a JIRA ticket and implement an enhancement as you have suggested.

ghost commented 5 years ago

METROAE-1061

ghost commented 4 years ago

Resolved in v4.0