zabbix / zabbix-docker

Official Zabbix Dockerfiles
https://www.zabbix.com
GNU Affero General Public License v3.0
2.33k stars 1.36k forks source link

Issue with Upgrading Zabbix from 6.0.2 to 6.4: Database Upgrade Failing in HA Mode #1376

Closed alemsas closed 3 months ago

alemsas commented 3 months ago

I am currently running Zabbix 6.0.2 in a Docker environment and attempting to upgrade to Zabbix 6.4. My setup includes Zabbix server and agent containers, with the Zabbix server using MySQL (Percona XtraDB Cluster). I have followed the upgrade instructions, but the upgrade process fails because the Zabbix server still attempts to operate in HA mode during the database upgrade.

Current Setup:

Database: Percona XtraDB Cluster

My configuration:

cat docker-compose.yml
version: "3.9"

services:
  zabbix-server-1:
    image: 'zabbix/zabbix-server-mysql:ubuntu-6.4-latest'
#    ports:
#      - "10051:10051"
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /var/lib/snmp/mibs:/var/lib/zabbix/mibs:ro
      - './usr/lib/zabbix/alertscripts:/usr/lib/zabbix/alertscripts:ro'
      - './usr/lib/zabbix/externalscripts:/usr/lib/zabbix/externalscripts:ro'
    environment:
      - DB_SERVER_HOST=172.31.63.1
      - DB_SERVER_PORT=3306
      - ZBX_HANODENAME=zabbix-server-1
      - ZBX_NODEADDRESS=172.31.63.1
    env_file:
      - ./envs/.env_mysql
      - ./envs/.env_server
    container_name: zabbix-server-1
    hostname: zbxSRV-1
    restart: unless-stopped
    network_mode: host
    ulimits:
      nproc: 65535
      nofile:
        soft: 20000
        hard: 40000
    stop_grace_period: 30s
  zabbix-agent-1:
    image: 'zabbix/zabbix-agent:ubuntu-latest'
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - './zabbix-server/zabbix_agentd.d:/etc/zabbix/zabbix_agentd.d:ro'
      - './zabbix-server/var/lib/zabbix:/var/lib/zabbix:ro'
    env_file:
     - ./envs/.env_agent
    environment:
      - ZBX_SERVER_HOST=127.0.0.1,172.31.63.1
      - ZBX_ACTIVE_ALLOW=false
    privileged: true
    container_name: zabbix-agent-1
    hostname: zbxAgent-1
    restart: unless-stopped
    pid: "host"
    network_mode: host
    stop_grace_period: 5s

also here is my docker logs for zabbix-server

docker logs -f --tail=30 zabbix-server-1
** Updating '/etc/zabbix/zabbix_server.conf' parameter "TLSCipherPSK13": ''...removed
** Updating '/etc/zabbix/zabbix_server.conf' parameter "TLSKeyFile": ''...removed
** Updating '/etc/zabbix/zabbix_server.conf' parameter "TLSPSKIdentity": ''...removed
** Updating '/etc/zabbix/zabbix_server.conf' parameter "TLSPSKFile": ''...removed
** Updating '/etc/zabbix/zabbix_server.conf' parameter "ServiceManagerSyncFrequency": ''...removed
** Updating '/etc/zabbix/zabbix_server.conf' parameter "HANodeName": 'zabbix-server-1'...updated
** Updating '/etc/zabbix/zabbix_server.conf' parameter "NodeAddress": '172.31.63.1'...updated
** Updating '/etc/zabbix/zabbix_server.conf' parameter "User": 'zabbix'...updated
Starting Zabbix Server. Zabbix 6.4.14 (revision 0a50e61).
Press Ctrl+C to exit.

     7:20240516:232828.012 Starting Zabbix Server. Zabbix 6.4.14 (revision 0a50e61).
     7:20240516:232828.012 ****** Enabled features ******
     7:20240516:232828.012 SNMP monitoring:           YES
     7:20240516:232828.012 IPMI monitoring:           YES
     7:20240516:232828.012 Web monitoring:            YES
     7:20240516:232828.012 VMware monitoring:         YES
     7:20240516:232828.012 SMTP authentication:       YES
     7:20240516:232828.012 ODBC:                      YES
     7:20240516:232828.012 SSH support:               YES
     7:20240516:232828.012 IPv6 support:              YES
     7:20240516:232828.012 TLS support:               YES
     7:20240516:232828.012 ******************************
     7:20240516:232828.012 using configuration file: /etc/zabbix/zabbix_server.conf
     7:20240516:232828.044 Zabbix supports only "utf8_bin,utf8mb3_bin,utf8mb4_bin" collation(s). Database "zabbix" has default collation "utf8mb4_0900_ai_ci"
     7:20240516:232828.080 current database version (mandatory/optional): 06000000/06000020
     7:20240516:232828.080 required mandatory version: 06040000
     7:20240516:232828.080 mandatory patches were found
     7:20240516:232828.082 cannot perform database upgrade in HA mode: all nodes need to be stopped and Zabbix server started in standalone mode for the time of upgrade.
     7:20240516:232828.082 Zabbix Server stopped. Zabbix 6.4.14 (revision 0a50e61).

Steps Taken:

  1. ​Dynamically in MySQL:SET GLOBAL log_bin_trust_function_creators = 1;
  2. Updated Docker Image in docker-compose.yml:
  3. Cleared the ha_node Table:​ ​4.Set Environment Variable to Disable HA Mode with remove below into my compose!
    - ZBX_HANODENAME=zabbix-server-1
    - ZBX_NODEADDRESS=172.31.63.1

Request for Help: What additional steps or configurations are necessary to ensure the Zabbix server starts in standalone mode to complete the database upgrade? Are there any known issues or additional configurations required when upgrading from 6.0.2 to 6.4 that I might have overlooked?​

alemsas commented 3 months ago

Also i increase debug level and stiil i get this

     6:20240517:093156.843 End of zbx_db_connect():0
     6:20240517:093156.844 In zbx_dbms_version_info_extract()
     6:20240517:093156.844 End of zbx_dbms_version_info_extract() version:80036
     6:20240517:093156.845 In DBcheck_version()
     6:20240517:093156.845 In zbx_db_connect() flag:0
     6:20240517:093156.875 End of zbx_db_connect():0
     6:20240517:093156.876 query [txnlev:0] [show tables like 'dbversion']
     6:20240517:093156.877 query [txnlev:0] [select mandatory,optional from dbversion]
     6:20240517:093156.878 current database version (mandatory/optional): 06000000/06000020
     6:20240517:093156.878 required mandatory version: 06040000
     6:20240517:093156.879 mandatory patches were found
     6:20240517:093156.879 query [txnlev:0] [show tables like 'ha_node']
     6:20240517:093156.880 query [txnlev:1] [begin;]
     6:20240517:093156.881 query [txnlev:1] [select unix_timestamp(),ha_failover_delay from config]
     6:20240517:093156.882 query [txnlev:1] [select lastaccess,name from ha_node where status not in (1,2) order by ha_nodeid for update]
     6:20240517:093156.882 query [txnlev:1] [commit;]
     6:20240517:093156.883 cannot perform database upgrade in HA mode: all nodes need to be stopped and Zabbix server started in standalone mode for the time of upgrade.
     6:20240517:093156.883 End of DBcheck_version():FAIL
     6:20240517:093156.884 Zabbix Server stopped. Zabbix 6.4.14 (revision 0a50e61).
alemsas commented 3 months ago

No one can help?!

dotneft commented 3 months ago

hi! Please provide full container log without debug and without defined vars ZBX_HANODENAME and ZBX_NODEADDRESS.

alemsas commented 3 months ago

@dotneft I have some changes, and I run databse with galera, also Here is my galera configuration:

version: "3.9"

services:
  mariadb-galera-1:
    image: 'bitnami/mariadb-galera:latest'
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - '/home/zabbix-cluster/mariadb-galera/Data/zabbix:/bitnami/mariadb'
      - '/home/zabbix-cluster/mariadb-galera/conf/my_custom.cnf:/opt/bitnami/mariadb/conf/my_custom.cnf:ro'
    env_file:
      - /home/zabbix-cluster/mariadb-galera/env_vars/.env_db
    environment:
      - MARIADB_GALERA_CLUSTER_BOOTSTRAP=yes
      - MARIADB_GALERA_FORCE_SAFETOBOOTSTRAP=yes
      - MARIADB_GALERA_CLUSTER_ADDRESS=gcomm://172.31.63.1:4567,172.31.63.2:4567,172.31.63.44:4567
      - MARIADB_GALERA_NODE_ADDRESS=172.31.63.1
      - MARIADB_EXTRA_FLAGS=--max-connect-errors=1000 --max_connections=1000
    ulimits:
      nproc: 65535
      nofile:
        soft: 20000
        hard: 40000
    network_mode: host
    container_name: mariadb-galera-1
    hostname: ba1-DC-SRV1
    restart: unless-stopped

and without debug and without ZBX_HANODENAME and ZBX_NODEADDRESS, I got:

docker logs -f --tail=30 zabbix-server-1
** Updating '/etc/zabbix/zabbix_server.conf' parameter "TLSCipherPSK13": ''...removed
** Updating '/etc/zabbix/zabbix_server.conf' parameter "TLSKeyFile": ''...removed
** Updating '/etc/zabbix/zabbix_server.conf' parameter "TLSPSKIdentity": ''...removed
** Updating '/etc/zabbix/zabbix_server.conf' parameter "TLSPSKFile": ''...removed
** Updating '/etc/zabbix/zabbix_server.conf' parameter "ServiceManagerSyncFrequency": ''...removed
** Updating '/etc/zabbix/zabbix_server.conf' parameter "HANodeName": 'zabbix-server-1'...updated
** Updating '/etc/zabbix/zabbix_server.conf' parameter "NodeAddress": '172.31.63.1'...updated
** Updating '/etc/zabbix/zabbix_server.conf' parameter "User": 'zabbix'...updated
Starting Zabbix Server. Zabbix 6.4.14 (revision 0a50e61).
Press Ctrl+C to exit.

     7:20240516:232828.012 Starting Zabbix Server. Zabbix 6.4.14 (revision 0a50e61).
     7:20240516:232828.012 ****** Enabled features ******
     7:20240516:232828.012 SNMP monitoring:           YES
     7:20240516:232828.012 IPMI monitoring:           YES
     7:20240516:232828.012 Web monitoring:            YES
     7:20240516:232828.012 VMware monitoring:         YES
     7:20240516:232828.012 SMTP authentication:       YES
     7:20240516:232828.012 ODBC:                      YES
     7:20240516:232828.012 SSH support:               YES
     7:20240516:232828.012 IPv6 support:              YES
     7:20240516:232828.012 TLS support:               YES
     7:20240516:232828.012 ******************************
     7:20240516:232828.012 using configuration file: /etc/zabbix/zabbix_server.conf
     7:20240516:232828.044 Zabbix supports only "utf8_bin,utf8mb3_bin,utf8mb4_bin" collation(s). Database "zabbix" has default collation "utf8mb4_0900_ai_ci"
     7:20240516:232828.080 current database version (mandatory/optional): 06000000/06000020
     7:20240516:232828.080 required mandatory version: 06040000
     7:20240516:232828.080 mandatory patches were found
     7:20240516:232828.082 cannot perform database upgrade in HA mode: all nodes need to be stopped and Zabbix server started in standalone mode for the time of upgrade.
     7:20240516:232828.082 Zabbix Server stopped. Zabbix 6.4.14 (revision 0a50e61).

Also I try these steps: 1) Run galera in standalone mode with remove bootstrap,gcom 2) Fill my database with my backup from old database machine 3) Run Zabbix-Server without HA enviornment 4) Then stop zabbix server and galera, and run galera with cluster mode and add HA to zabbix-server, but still have same error.

dotneft commented 3 months ago

it is starting WITH ZBX_HANODENAME and ZBX_NODEADDRESS:

** Updating '/etc/zabbix/zabbix_server.conf' parameter "HANodeName": 'zabbix-server-1'...updated
** Updating '/etc/zabbix/zabbix_server.conf' parameter "NodeAddress": '172.31.63.1'...updated
alemsas commented 3 months ago

@dotneft Thanks, but I remove or comment

      - ZBX_HANODENAME=zabbix-server-1
      - ZBX_NODEADDRESS=172.31.63.1

in my zabbix server compose as well as zabbix.conf but I still get it! I import the zabbix server configuration by .env

**.env_server**

# ZBX_LISTENIP=
# ZBX_LISTENBACKLOG=
# ZBX_HISTORYSTORAGEURL=http://elasticsearch:9200/ # Available since 3.4.5
# ZBX_HISTORYSTORAGETYPES=uint,dbl,str,log,text # Available since 3.4.5
ZBX_ALLOWUNSUPPORTEDDBVERSIONS=1 # Available since 6.0.0
# ZBX_DBTLSCONNECT=required # Available since 5.0.0
# ZBX_DBTLSCAFILE=/run/secrets/root-ca.pem # Available since 5.0.0
# ZBX_DBTLSCERTFILE=/run/secrets/client-cert.pem # Available since 5.0.0
# ZBX_DBTLSKEYFILE=/run/secrets/client-key.pem # Available since 5.0.0
# ZBX_DBTLSCIPHER= # Available since 5.0.0
# ZBX_DBTLSCIPHER13= # Available since 5.0.0
ZBX_AUTOHANODENAME=hostname # Allowed values: fqdn, hostname. Available since 6.0.0
# ZBX_HANODENAME= # Available since 6.0.0
# ZBX_AUTONODEADDRESS=fqdn # Allowed values: fqdn, hostname. Available since 6.0.0
# ZBX_NODEADDRESSPORT=10051 # Allowed to use with ZBX_AUTONODEADDRESS variable only. Available since 6.0.0
# ZBX_NODEADDRESS=localhost:10051 # Available since 6.0.0
# ZBX_DEBUGLEVEL=3
ZBX_STARTPOLLERS=600
# ZBX_IPMIPOLLERS=4
ZBX_STARTPREPROCESSORS=32 # Available since 3.4.0
ZBX_STARTPOLLERSUNREACHABLE=200
ZBX_STARTTRAPPERS=24
ZBX_STARTPINGERS=150
# ZBX_STARTDISCOVERERS=1
# ZBX_STARTHTTPPOLLERS=1
ZBX_STARTHISTORYPOLLERS=32 # Available since 5.4.0
# ZBX_STARTTIMERS=1
# ZBX_STARTESCALATORS=1
ZBX_STARTALERTERS=6 # Available since 3.4.0
ZBX_STARTLLDPROCESSORS=32
# ZBX_JAVAGATEWAY_ENABLE=true
# ZBX_JAVAGATEWAY=zabbix-java-gateway
# ZBX_JAVAGATEWAYPORT=10052
# ZBX_STARTJAVAPOLLERS=5
# ZBX_STARTVMWARECOLLECTORS=0
# ZBX_VMWAREFREQUENCY=60
# ZBX_VMWAREPERFFREQUENCY=60
# ZBX_VMWARECACHESIZE=8M
# ZBX_VMWARETIMEOUT=10
ZBX_ENABLE_SNMP_TRAPS=true
# ZBX_SOURCEIP=
# ZBX_HOUSEKEEPINGFREQUENCY=1
# ZBX_MAXHOUSEKEEPERDELETE=5000
# ZBX_PROBLEMHOUSEKEEPINGFREQUENCY=60
# ZBX_SENDERFREQUENCY=30
ZBX_CACHESIZE=8G
# ZBX_CACHEUPDATEFREQUENCY=60
ZBX_STARTDBSYNCERS=24
ZBX_HISTORYCACHESIZE=2G
ZBX_HISTORYINDEXCACHESIZE=2G
# ZBX_HISTORYSTORAGEDATEINDEX=0
ZBX_TRENDCACHESIZE=2G
ZBX_TRENDFUNCTIONCACHESIZE=2G
ZBX_VALUECACHESIZE=2G
ZBX_TIMEOUT=6
# ZBX_TRAPPERTIMEOUT=300
# ZBX_UNREACHABLEPERIOD=45
# ZBX_UNAVAILABLEDELAY=60
# ZBX_UNREACHABLEDELAY=15
ZBX_LOGSLOWQUERIES=7000
# ZBX_EXPORTFILESIZE=
# ZBX_STARTPROXYPOLLERS=1
# ZBX_PROXYCONFIGFREQUENCY=3600
# ZBX_PROXYDATAFREQUENCY=1
# ZBX_LOADMODULE="dummy1.so,dummy2.so,dummy10.so"
# ZBX_TLSCAFILE=
# ZBX_TLSCRLFILE=
# ZBX_TLSCERTFILE=
# ZBX_TLSKEYFILE=
# ZBX_VAULTDBPATH=
# ZBX_VAULTURL=https://127.0.0.1:8200
# VAULT_TOKEN=
# ZBX_STARTREPORTWRITERS=0
# ZBX_WEBSERVICEURL=http://zabbix-web-service:10053/report
# ZBX_SERVICEMANAGERSYNCFREQUENCY=60#

where are they come?!I am really confuse!

dotneft commented 3 months ago

when you change ENV vars, you must recreate container.

alemsas commented 3 months ago

@dotneft I do that!I know about this!

docker compose down or docker rm -f zabbix-server-1

and then

docker compose up
alemsas commented 3 months ago

also i dont have any volume!

dotneft commented 3 months ago

you have the same env in compose file as well:

    environment:
      - ZBX_HANODENAME=zabbix-server-1
      - ZBX_NODEADDRESS=172.31.63.1
alemsas commented 3 months ago

@dotneft As I said I removed in compose or comment them and also comment them into my .env!but still same!

dotneft commented 3 months ago

show this: docker compose -f docker-compose.yml config

dotneft commented 3 months ago

also why you have this one: ZBX_AUTOHANODENAME=hostname # Allowed values: fqdn, hostname. Available since 6.0.0

alemsas commented 3 months ago
docker compose -f docker-compose.yml config
name: zabbix-server-mysql
services:
  zabbix-agent-1:
    container_name: zabbix-agent-1
    environment:
      ZBX_ACTIVE_ALLOW: "false"
      ZBX_SERVER_HOST: 127.0.0.1,172.31.60.117
    hostname: zbxSRV-1
    image: zabbix/zabbix-agent:ubuntu-6.0.17
    network_mode: host
    pid: host
    privileged: true
    restart: unless-stopped
    stop_grace_period: 5s
    volumes:
    - type: bind
      source: /etc/localtime
      target: /etc/localtime
      read_only: true
      bind:
        create_host_path: true
    - type: bind
      source: /home/master/zabbix-server-mysql/zabbix_agentd.d
      target: /etc/zabbix/zabbix_agentd.d
      read_only: true
      bind:
        create_host_path: true
    - type: bind
      source: /home/master/zabbix-server-mysql/var/lib/zabbix
      target: /var/lib/zabbix
      read_only: true
      bind:
        create_host_path: true
  zabbix-server-1:
    container_name: zabbix-server-1
    environment:
      DB_SERVER_HOST: 172.31.60.117
      DB_SERVER_PORT: "3306"
      MYSQL_DATABASE: zabbix
      MYSQL_PASSWORD: my_sql_pass
      MYSQL_ROOT_PASSWORD: root_pass
      MYSQL_USER: root
      ZBX_ALLOWUNSUPPORTEDDBVERSIONS: "1"
      ZBX_AUTOHANODENAME: hostname
      ZBX_CACHESIZE: 8G
      ZBX_DEBUGLEVEL: "4"
      ZBX_ENABLE_SNMP_TRAPS: "true"
      ZBX_HISTORYCACHESIZE: 2G
      ZBX_HISTORYINDEXCACHESIZE: 2G
      ZBX_LOGSLOWQUERIES: "7000"
      ZBX_STARTALERTERS: "6"
      ZBX_STARTDBSYNCERS: "24"
      ZBX_STARTHISTORYPOLLERS: "32"
      ZBX_STARTLLDPROCESSORS: "32"
      ZBX_STARTPINGERS: "150"
      ZBX_STARTPOLLERS: "600"
      ZBX_STARTPOLLERSUNREACHABLE: "200"
      ZBX_STARTPREPROCESSORS: "32"
      ZBX_STARTTRAPPERS: "24"
      ZBX_TIMEOUT: "6"
      ZBX_TRENDCACHESIZE: 2G
      ZBX_TRENDFUNCTIONCACHESIZE: 2G
      ZBX_VALUECACHESIZE: 2G
    hostname: zbxSRV-1
    image: arvan/zabbix-server-mysql:ubuntu-6.0.17
    network_mode: host
    restart: unless-stopped
    stop_grace_period: 30s
    ulimits:
      nofile:
        soft: 20000
        hard: 40000
      nproc: 65535
    volumes:
    - type: bind
      source: /etc/localtime
      target: /etc/localtime
      read_only: true
      bind:
        create_host_path: true
    - type: bind
      source: /var/lib/snmp/mibs
      target: /var/lib/zabbix/mibs
      read_only: true
      bind:
        create_host_path: true
    - type: bind
      source: /home/master/zabbix-server-mysql/usr/lib/zabbix/alertscripts
      target: /usr/lib/zabbix/alertscripts
      read_only: true
      bind:
        create_host_path: true
    - type: bind
      source: /home/master/zabbix-server-mysql/usr/lib/zabbix/externalscripts
      target: /usr/lib/zabbix/externalscripts
      read_only: true
      bind:
        create_host_path: true

@dotneft

dotneft commented 3 months ago

it is completely different one what you show before. Here is 6.0, while in first message you had 6.4.

dotneft commented 3 months ago

finally, remove or comment this one: "ZBX_AUTOHANODENAME=hostname # Allowed values: fqdn, hostname. Available since 6.0.0" and try again.

alemsas commented 3 months ago

it is completely different one what you show before. Here is 6.0, while in first message you had 6.4.

Yes, you are right, I put the Docker image here differently, but the configuration is the same, and I only updated the images to 6.4(or newer version) There is no difference in the configuration with the new version that I am trying to update

alemsas commented 3 months ago

finally, remove or comment this one: "ZBX_AUTOHANODENAME=hostname # Allowed values: fqdn, hostname. Available since 6.0.0" and try again.

I will test this and share the results with you Thank you for your time and consideration.

alemsas commented 3 months ago

finally, remove or comment this one: "ZBX_AUTOHANODENAME=hostname # Allowed values: fqdn, hostname. Available since 6.0.0" and try again.

Yes, it worked!thanks!