librenms / docker

LibreNMS Docker image
MIT License
686 stars 278 forks source link

Podman-compose multiple errors #323

Closed bukovszg closed 1 year ago

bukovszg commented 1 year ago

Behaviour

I have encountered several issues when trying to run the container with podman-compose.

  1. DNS resolution from the PHP framework not working (yes it is working from the container itself)
  2. Dispatcher can reach the database but cannot find tables (yes the tables are there)
  3. The main container sometimes cannot find tables in the database (error only comes sometimes)

Steps to reproduce this issue

  1. Use the docker-compose example with Dispatcher and Redis
  2. Use RHEL8 with podman-compose to start the containers

Expected behaviour

Container starts, can read/.write all the database tables, dispatcher registers as poller

What has been checked

DNS confirmed working:

# podman exec librenms_main host librenms_redis.dns.podman
librenms_redis.dns.podman has address 192.168.2.125

DB is reachable and populated:

# podman exec -ti librenms_dispatcher artisan db
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 590
Server version: 10.9.4-MariaDB-1:10.9.4+maria~ubu2204 mariadb.org binary distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [lnms]> show tables;
+--------------------------------+
| Tables_in_lnms                 |
+--------------------------------+
| access_points                  |
| alert_device_map               |
| alert_group_map                |
| alert_location_map             |
| alert_log                      |
| alert_rules                    |
| alert_schedulables             |
| alert_schedule                 |
| alert_template_map             |
| alert_templates                |
| alert_transport_groups         |
| alert_transport_map            |
| alert_transports               |
| alerts                         |
| api_tokens                     |
| application_metrics            |
| applications                   |
| authlog                        |
| availability                   |
| bgpPeers                       |
| bgpPeers_cbgp                  |
| bill_data                      |
| bill_history                   |
| bill_perms                     |
| bill_port_counters             |
| bill_ports                     |
| bills                          |
| cache                          |
| cache_locks                    |
| callback                       |
| cef_switching                  |
| ciscoASA                       |
| component                      |
| component_prefs                |
| component_statuslog            |
| config                         |
| customers                      |
| customoids                     |
| dashboards                     |
| dbSchema                       |
| device_graphs                  |
| device_group_device            |
| device_groups                  |
| device_outages                 |
| device_perf                    |
| device_relationships           |
| devices                        |
| devices_attribs                |
| devices_group_perms            |
| devices_perms                  |
| entPhysical                    |
| entPhysical_state              |
| entityState                    |
| eventlog                       |
| graph_types                    |
| hrDevice                       |
| hrSystem                       |
| ipsec_tunnels                  |
| ipv4_addresses                 |
| ipv4_mac                       |
| ipv4_networks                  |
| ipv6_addresses                 |
| ipv6_networks                  |
| isis_adjacencies               |
| juniAtmVp                      |
| links                          |
| loadbalancer_rservers          |
| loadbalancer_vservers          |
| locations                      |
| mac_accounting                 |
| mefinfo                        |
| mempools                       |
| migrations                     |
| mpls_lsp_paths                 |
| mpls_lsps                      |
| mpls_saps                      |
| mpls_sdp_binds                 |
| mpls_sdps                      |
| mpls_services                  |
| mpls_tunnel_ar_hops            |
| mpls_tunnel_c_hops             |
| munin_plugins                  |
| munin_plugins_ds               |
| netscaler_vservers             |
| notifications                  |
| notifications_attribs          |
| ospf_areas                     |
| ospf_instances                 |
| ospf_nbrs                      |
| ospf_ports                     |
| packages                       |
| pdb_ix                         |
| pdb_ix_peers                   |
| plugins                        |
| poller_cluster                 |
| poller_cluster_stats           |
| poller_groups                  |
| pollers                        |
| port_group_port                |
| port_groups                    |
| ports                          |
| ports_adsl                     |
| ports_fdb                      |
| ports_nac                      |
| ports_perms                    |
| ports_stack                    |
| ports_statistics               |
| ports_stp                      |
| ports_vdsl                     |
| ports_vlans                    |
| printer_supplies               |
| processes                      |
| processors                     |
| proxmox                        |
| proxmox_ports                  |
| pseudowires                    |
| push_subscriptions             |
| route                          |
| sensors                        |
| sensors_to_state_indexes       |
| service_templates              |
| service_templates_device       |
| service_templates_device_group |
| services                       |
| session                        |
| sessions                       |
| slas                           |
| state_indexes                  |
| state_translations             |
| storage                        |
| stp                            |
| syslog                         |
| tnmsneinfo                    |
| transport_group_transport      |
| ucd_diskio                     |
| users                          |
| users_prefs                    |
| users_widgets                  |
| vlans                          |
| vminfo                         |
| vrf_lite_cisco                 |
| vrfs                           |
| wireless_sensors               |
+--------------------------------+
143 rows in set (0.001 sec)

Configuration

Docker info

host:
  arch: amd64
  buildahVersion: 1.26.2
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.2-2.module+el8.6.0+15917+093ca6f8.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.2, commit: 8c4f33ac0dcf558874b453d5027028b18d1502db'
  cpuUtilization:
    idlePercent: 99.28
    systemPercent: 0.2
    userPercent: 0.53
  cpus: 4
  distribution:
    distribution: '"rhel"'
    version: "8.6"
  eventLogger: file
  hostname: MUC9LIBREV1P.account.intern
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 4.18.0-372.26.1.el8_6.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 1892970496
  memTotal: 8140443648
  networkBackend: cni
  ociRuntime:
    name: runc
    package: runc-1.1.3-2.module+el8.6.0+15917+093ca6f8.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.1.3
      spec: 1.0.2-dev
      go: go1.17.7
      libseccomp: 2.5.2
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-2.module+el8.6.0+15917+093ca6f8.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 0
  swapTotal: 0
  uptime: 984h 14m 47.35s (Approximately 41.00 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 4284481536
  graphRootUsed: 3074039808
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 3
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.1.1
  Built: 1657551413
  BuiltTime: Mon Jul 11 16:56:53 2022
  GitCommit: ""
  GoVersion: go1.17.7
  Os: linux
  OsArch: linux/amd64
  Version: 4.1.1

Logs

# podman-compose up
['podman', '--version', '']
using podman version: 4.1.1
** excluding:  set()
['podman', 'network', 'exists', 'librenms_backend']
podman create --name=librenms_db --label io.podman.compose.config-hash=123 --label io.podman.compose.project=librenms --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=librenms --label com.docker.compose.project.working_dir=/data/librenms --label com.docker.compose.project.config_files=docker-compose.yaml --label com.docker.compose.container-number=1 --label com.docker.compose.service=db -e TZ=Europe/Paris -e MYSQL_ALLOW_EMPTY_PASSWORD=yes -e MYSQL_DATABASE=lnms -e MYSQL_USER=librenms -e MYSQL_PASSWORD=newpassword -v /data/librenms/db:/var/lib/mysql --net librenms_backend --network-alias db --restart always registry.accounts.intern/gsit/mariadb mysqld --innodb-file-per-table=1 --lower-case-table-names=0 --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
45a62fd37a658b36d25619d2fde6e4a5f7061a7ccebbb51b40004b3baaf3e923
exit code: 0
['podman', 'network', 'exists', 'librenms_backend']
podman create --name=librenms_redis --label io.podman.compose.config-hash=123 --label io.podman.compose.project=librenms --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=librenms --label com.docker.compose.project.working_dir=/data/librenms --label com.docker.compose.project.config_files=docker-compose.yaml --label com.docker.compose.container-number=1 --label com.docker.compose.service=redis -e TZ=Europe/Paris --net librenms_backend --network-alias redis --restart always registry.accounts.intern/gsit/redis:5
b2aa071623f5cd1b4309f8deb9cefc82cf2c74d58aff11152a4fbeef128fd4cf
exit code: 0
['podman', 'network', 'exists', 'librenms_backend']
podman create --name=librenms_main --label io.podman.compose.config-hash=123 --label io.podman.compose.project=librenms --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=librenms --label com.docker.compose.project.working_dir=/data/librenms --label com.docker.compose.project.config_files=docker-compose.yaml --label com.docker.compose.container-number=1 --label com.docker.compose.service=librenms --network bridge --cap-add NET_ADMIN --cap-add NET_RAW --env-file /data/librenms/librenms.env -e TZ=Europe/Paris -e PUID=10000 -e PGID=10000 -e DB_HOST=librenms_db -e DB_NAME=lnms -e DB_DATABASE=lnms -e DB_USERNAME=librenms -e DB_USER=librenms -e DB_PASSWORD=newpassword -e DB_TIMEOUT=60 -v /data/librenms/librenms:/data --net librenms_backend --network-alias librenms -p 8000:8000 --hostname librenms.mgmt.intern --restart always registry.accounts.intern/gsit/librenms
e11d4e73968c744c1407b1802ba844abbf7304a438b6730c90662740cc8fef9f
exit code: 0
['podman', 'network', 'exists', 'librenms_backend']
podman create --name=librenms_dispatcher --label io.podman.compose.config-hash=123 --label io.podman.compose.project=librenms --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=librenms --label com.docker.compose.project.working_dir=/data/librenms --label com.docker.compose.project.config_files=docker-compose.yaml --label com.docker.compose.container-number=1 --label com.docker.compose.service=dispatcher --cap-add NET_ADMIN --cap-add NET_RAW --env-file /data/librenms/librenms.env -e TZ=Europe/Paris -e PUID=10000 -e PGID=10000 -e DB_HOST=librenms_db -e DB_NAME=librenms -e DB_USER=librenms -e DB_DATABASE=lnms -e DB_USERNAME=librenms -e DB_PASSWORD=newpassword -e DB_TIMEOUT=60 -e DISPATCHER_NODE_ID=dispatcher1 -e SIDECAR_DISPATCHER=1 -v /data/librenms/librenms:/data --net librenms_backend --network-alias dispatcher --hostname librenms-dispatcher --restart always registry.accounts.intern/gsit/librenms
02f6e97ba91b4c6ed2539aa4b514cf3c70f68f1a736e48bdf312adb473f6974c
exit code: 0
podman start -a librenms_db
2022-11-17 15:58:22+01:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.9.4+maria~ubu2204 started.
2022-11-17 15:58:22+01:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2022-11-17 15:58:22+01:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.9.4+maria~ubu2204 started.
2022-11-17 15:58:22+01:00 [Note] [Entrypoint]: MariaDB upgrade not required
2022-11-17 15:58:22 0 [Note] mysqld (server 10.9.4-MariaDB-1:10.9.4+maria~ubu2204) starting as process 1 ...
2022-11-17 15:58:22 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-11-17 15:58:22 0 [Note] InnoDB: Number of transaction pools: 1
2022-11-17 15:58:22 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-11-17 15:58:22 0 [Note] mysqld: O_TMPFILE is not supported on /tmp (disabling future attempts)
2022-11-17 15:58:22 0 [Warning] mysqld: io_uring_queue_init() failed with ENOSYS: check seccomp filters, and the kernel version (newer than 5.1 required)
2022-11-17 15:58:22 0 [Warning] InnoDB: liburing disabled: falling back to innodb_use_native_aio=OFF
2022-11-17 15:58:22 0 [Note] InnoDB: Initializing buffer pool, total size = 128.000MiB, chunk size = 2.000MiB
2022-11-17 15:58:22 0 [Note] InnoDB: Completed initialization of buffer pool
2022-11-17 15:58:22 0 [Note] InnoDB: Buffered log writes (block size=512 bytes)
2022-11-17 15:58:22 0 [Note] InnoDB: 128 rollback segments are active.
2022-11-17 15:58:22 0 [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
2022-11-17 15:58:22 0 [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
2022-11-17 15:58:22 0 [Note] InnoDB: log sequence number 1371837; transaction id 3040
2022-11-17 15:58:22 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2022-11-17 15:58:22 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-11-17 15:58:22 0 [Warning] You need to use --log-bin to make --expire-logs-days or --binlog-expire-logs-seconds work.
2022-11-17 15:58:22 0 [Note] Server socket created on IP: '0.0.0.0'.
2022-11-17 15:58:22 0 [Note] Server socket created on IP: '::'.
2022-11-17 15:58:22 0 [Note] mysqld: ready for connections.
Version: '10.9.4-MariaDB-1:10.9.4+maria~ubu2204'  socket: '/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
2022-11-17 15:58:23 0 [Note] InnoDB: Buffer pool(s) load completed at 221117 15:58:23
podman start -a librenms_redis
1:C 17 Nov 2022 15:58:23.602 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 17 Nov 2022 15:58:23.602 # Redis version=5.0.14, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 17 Nov 2022 15:58:23.602 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
1:M 17 Nov 2022 15:58:23.602 * Running mode=standalone, port=6379.
1:M 17 Nov 2022 15:58:23.602 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 17 Nov 2022 15:58:23.602 # Server initialized
1:M 17 Nov 2022 15:58:23.602 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 17 Nov 2022 15:58:23.602 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 17 Nov 2022 15:58:23.603 * Ready to accept connections
podman start -a librenms_main
[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] 00-fix-logs.sh: executing...
[cont-init.d] 00-fix-logs.sh: exited 0.
[cont-init.d] 01-fix-uidgid.sh: executing...
Switching to PGID 10000...
Switching to PUID 10000...
[cont-init.d] 01-fix-uidgid.sh: exited 0.
[cont-init.d] 02-fix-perms.sh: executing...
Fixing perms...
[cont-init.d] 02-fix-perms.sh: exited 0.
[cont-init.d] 03-config.sh: executing...
Setting timezone to Europe/Paris...
Setting PHP-FPM configuration...
Setting PHP INI configuration...
Setting OpCache configuration...
Setting Nginx configuration...
Updating SNMP community...
Initializing LibreNMS files / folders...
Setting LibreNMS configuration...
Checking LibreNMS plugins...
Fixing perms...
Checking additional Monitoring plugins...
Checking alert templates...
[cont-init.d] 03-config.sh: exited 0.
[cont-init.d] 04-svc-main.sh: executing...
Waiting 60s for database to be ready...
Database ready!
Updating database schema...
podman start -a librenms_dispatcher
Nothing to migrate.
[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...Seeding: Database\Seeders\DefaultAlertTemplateSeeder
Seeded:  Database\Seeders\DefaultAlertTemplateSeeder (1.03ms)
Seeding: Database\Seeders\DefaultLegacySchemaSeeder
Seeded:  Database\Seeders\DefaultLegacySchemaSeeder (0.60ms)
Seeding: Database\Seeders\ConfigSeeder
Seeded:  Database\Seeders\ConfigSeeder (0.96ms)
Database seeding completed successfully.
Clear cache
exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] 00-fix-logs.sh: executing...
[cont-init.d] 00-fix-logs.sh: exited 0.
[cont-init.d] 01-fix-uidgid.sh: executing...
Switching to PGID 10000...
Switching to PUID 10000...
[cont-init.d] 01-fix-uidgid.sh: exited 0.
[cont-init.d] 02-fix-perms.sh: executing...
Fixing perms...
[cont-init.d] 02-fix-perms.sh: exited 0.
[cont-init.d] 03-config.sh: executing...
Setting timezone to Europe/Paris...
Setting PHP-FPM configuration...
Setting PHP INI configuration...
Setting OpCache configuration...
Setting Nginx configuration...
Updating SNMP community...
Initializing LibreNMS files / folders...
Setting LibreNMS configuration...
Checking LibreNMS plugins...
Fixing perms...
Checking additional Monitoring plugins...
Checking alert templates...
[cont-init.d] 03-config.sh: exited 0.
[cont-init.d] 04-svc-main.sh: executing...
[cont-init.d] 04-svc-main.sh: exited 0.
[cont-init.d] 05-svc-dispatcher.sh: executing...
>>
>> Sidecar dispatcher container detected
>>
Waiting 60s for database to be ready...
Database ready!
Application cache cleared!
Configuration cache cleared!
Configuration cached successfully!
[cont-init.d] 04-svc-main.sh: exited 0.
[cont-init.d] 05-svc-dispatcher.sh: executing...
[cont-init.d] 05-svc-dispatcher.sh: exited 0.
[cont-init.d] 06-svc-syslogng.sh: executing...
[cont-init.d] 06-svc-syslogng.sh: exited 0.
[cont-init.d] 07-svc-cron.sh: executing...
Creating LibreNMS daily.sh cron task with the following period fields: 15 0 * * *
Fixing crontabs permissions...
[cont-init.d] 07-svc-cron.sh: exited 0.
[cont-init.d] 08-svc-snmptrapd.sh: executing...
[cont-init.d] 08-svc-snmptrapd.sh: exited 0.
[cont-init.d] ~-socklog: executing...
[cont-init.d] ~-socklog: exited 0.
[cont-init.d] done.
[services.d] starting services
crond: crond (busybox 1.35.0) started, log level 8
2022/11/17 15:58:26 [notice] 643#643: using the "epoll" event method
2022/11/17 15:58:26 [notice] 643#643: nginx/1.22.1
2022/11/17 15:58:26 [notice] 643#643: OS: Linux 4.18.0-372.26.1.el8_6.x86_64
2022/11/17 15:58:26 [notice] 643#643: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2022/11/17 15:58:26 [notice] 643#643: start worker processes
2022/11/17 15:58:26 [notice] 643#643: start worker process 657
2022/11/17 15:58:26 [notice] 643#643: start worker process 658
2022/11/17 15:58:26 [notice] 643#643: start worker process 659
2022/11/17 15:58:26 [notice] 643#643: start worker process 660
[services.d] done.
[17-Nov-2022 15:58:26] NOTICE: fpm is running, pid 644
[17-Nov-2022 15:58:26] NOTICE: ready to handle connections
ERROR: Table librenms.poller_cluster does not exist on librenms_db
[cont-init.d] 05-svc-dispatcher.sh: exited 1.
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[s6-finish] sending all processes the TERM signal.
[s6-finish] sending all processes the KILL signal and exiting.
exit code: 1

compose.txt logsnippets.txt librenms_dev.txt

bukovszg commented 1 year ago

I figured out at least one part of the problem. The DNS issues were coming from a /etc/resolv.conf setting of the host itself:

# Any changes made to this file will be lost

# The default domain name if none is supplied
search internal.domain other.domain

# The list of name servers

nameserver  X.Y.Z.W
nameserver  A.B.C.D

# The list of options
options timeout:2
options attempts:3
options rotate

This gets copied into the container by podman - it the extends it with settings for the podman internal name resolution:

search dns.podman  internal.domain other.domain
nameserver 192.168.2.1
nameserver X.Y.Z.W
nameserver A.B.C.D
options attempts:3
options rotate

due to option rotate however, all DNS queries are rotated, sometimes hitting the podman-internal DNS sometimes hitting the external DNS. The external DNS obviously knows nothing about the podman internal DNS names, causing random DNS resolution errors.

crazy-max commented 1 year ago

Don't think this is an issue with the image itself. Consider opening an issue at https://github.com/containers/podman-compose