julien6387 / supvisors

Supvisors: A Control System for Distributed Applications
http://supvisors.readthedocs.io
Apache License 2.0
91 stars 14 forks source link

Multiple docker images with supervisord servers+supvisors in them usage case - without sync between instances #107

Closed mfederowicz closed 1 year ago

mfederowicz commented 1 year ago

Hello

Of course thanks for this project

I have another case similar to https://github.com/julien6387/supvisors/issues/99, but I want to run in separate docker supervisor+supvisors (need update_numprocs feature) + workers managed by supvisors of course

I wated to run 3 or 4 containers based on one universal image with entrypoint-app.sh (simple dispatch supervisord.conf depends on env) - I have similar solution but only with supervisord and static config (with static numprocs=x values)

Is it possible to run supervisord + supvisors rpc api without all that stuff connected with statistic sync etc? because my example log looks like below:

INFO Set uid to user 0 succeeded
 INFO RPC interface 'supervisor' initialized
WARN;SupervisorData.read_disabilities: no persistence for program disabilities
INFO;SupvisorsMapper.configure: identifiers=['server_1']
INFO;SupvisorsMapper.find_local_identifier: local_identifier=server_1
INFO;SupvisorsMapper.configure: core_identifiers=['server_1']
WARN;Supvisors: cannot parse rules files: None - 'NoneType' object is not iterable
INFO;RPCInterface: using Supvisors=0.15 Supervisor=4.2.4
INFO;RPC interface 'supvisors' initialized
CRIT;Server 'inet_http_server' running without any HTTP authentication checking
INFO;RPC interface 'supervisor' initialized
WARN;SupervisorData.read_disabilities: no persistence for program disabilities
INFO;SupvisorsMapper.configure: identifiers=['server_1']
INFO;SupvisorsMapper.find_local_identifier: local_identifier=server_1
INFO;SupvisorsMapper.configure: core_identifiers=['server_1']
WARN;Supvisors: cannot parse rules files: None - 'NoneType' object is not iterable
INFO;RPCInterface: using Supvisors=0.15 Supervisor=4.2.4
INFO;RPC interface 'supvisors' initialized
CRIT;Server 'unix_http_server' running without any HTTP authentication checking
INFO;supervisord started with pid 1
INFO;SupervisorListener.on_running: local supervisord is RUNNING
INFO;FiniteStateMachine.set_state: Supvisors in INITIALIZATION
INFO;Context.master_identifier: 
INFO;SupervisorListener.on_running: local supervisord is RUNNING
CRIT;PublisherServer._bind: failed to bind the Supvisors publisher on port 61001
INFO;FiniteStateMachine.set_state: Supvisors in INITIALIZATION
INFO;Context.master_identifier: 
INFO;SupvisorsInstanceStatus.state: Supvisors=server_1 is CHECKING
INFO;SupvisorsInstanceStatus.state: Supvisors=server_1 is CHECKING
WARN;Context.on_authorization: failed to get auth status from Supvisors=server_1
CRIT;Context.invalid: local Supvisors instance is either SILENT or inconsistent
INFO;SupvisorsInstanceStatus.state: Supvisors=server_1 is SILENT

I dont want in my logs stuff like:

CRIT;PublisherServer._bind: failed to bind the Supvisors publisher on port xxxxx
Context.invalid: local Supvisors instance is either SILENT or inconsistent

is it possible , or maybe need some futuer development? Thanks for any help you can provide!

ps. of course I can ignore content of logs (and run containers without looking on logs) but, maybe it is possible to do that in the right way :)

julien6387 commented 1 year ago

Hello,

There are 2 supervisor daemons running here and logging into the same file. I assume that you have taken into account in your files that the port value of the inet_htpp_server sections must be different in your 2 daemons. However I suspect that you did not do the same for the internal_port value of the supvisors section.

Supvisors uses TCP connections to share the internal events. I think the first daemon here has bound the port 61001 and the second daemon has failed trying to bind the same port on the same host.

If confirmed, just know that supvisors.internal_port is set with the value of inet_htpp_server.port + 1 when not set. If no, could you please send me your supervisord.conf files for further investigation ?

mfederowicz commented 1 year ago

@julien6387 sure no problem

for test sake I have docker image based on python:3.9-slim-bullseye

FROM python:3.9-slim-bullseye

ENV IMAGE_VERSION="1" \
    IMAGE_REGISTRY="registry.dev/supvisor-host" \
    IMAGE_DESCRIPTION="Basic supervisord container"

ENV VIRTUAL_ENV=/opt/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

# Install dependencies:
COPY files/ /
RUN /opt/venv/bin/python3 -m pip install --upgrade pip && pip install -r /requirements.txt && mkdir "/var/log/supervisor" && apt update && apt install netcat -y

EXPOSE 61000 61001

ENTRYPOINT [ "/entrypoint-app.sh" ]

requirements.txt:

pip
supervisor
supvisors
psutil

supervisord_host.conf:

[inet_http_server]
port=:61000

[unix_http_server]
file=/var/run/supervisor/supervisor.sock
[supervisord]
nodaemon=true
logfile=/var/log/supervisor/supervisord_server.log
logfile_maxbytes=50MB       ; (max main logfile bytes b4 rotation;default 50MB)
logfile_backups=10          ; (num of main logfile rotation backups;default 10)
loglevel=info               ; (log level;default info; others: debug,warn,trace)
pidfile=/var/run/supervisord_server.pid
minfds=1024                 ; (min. avail startup file descriptors;default 1024)
minprocs=200                ; (min. avail process descriptors;default 200)
user=root

[supervisorctl]
serverurl=http://localhost:61000

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[rpcinterface:supvisors]
supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface
;supvisors_list = <server_1>host1:61000:
;rules_files = etc/supvisors_rules.xml
;core_identifiers = server_1
;stats_periods = 10,100

[ctlplugin:supvisors]
supervisor.ctl_factory = supvisors.supvisorsctl:make_supvisors_controller_plugin

;[include]
;files = /srv/www/app/docker/image-app/files/etc/supervisord.d/config_workers_host.conf

above of course i commented almost all unused lines (in test)

entrypoint-app.sh:

#!/bin/bash

set -e

if [ "${RUN_APP}" == "host2" ]; then
  exec supervisord -c /etc/supervisord_host2.conf
elif [ "${RUN_APP}" == "host3" ]; then
  exec supervisord -c /etc/supervisord_host3.conf
else
  exec supervisord -c /etc/supervisord_host.conf
fi

for basic test I run only one container with supervisord -c /etc/supervisord_host.conf so there should be only supervisord on port 61000, so there are no 2 deamons running on the same container :) or maybe we should count as deamon xmlrpc interface?

julien6387 commented 1 year ago

All right. The Supvisors plugin relies on the XML-RPC plugin extension and here it is created twice because both inet_http_server and unix_http_server sections are set. Supvisors can only work with the inet_http_server section. I have to make a fix so that the plugin is inhibited in the context of a unix_http_server. I thought I had dealt this case before but apparently it came back :-/ I'm afraid you have to comment the unix_http_server for the time being.

mfederowicz commented 1 year ago

ok @julien6387 some new details, when I comment out supvisors xmlrpc interface:

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

;[rpcinterface:supvisors]
;supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface

i have almost clean log entries:

2022-12-20 23:19:34,741 WARN received SIGTERM indicating exit request                                                                                                        
2022-12-20 23:19:37,784 INFO Set uid to user 0 succeeded                                                                                                                     
2022-12-20 23:19:37,787 INFO RPC interface 'supervisor' initialized                                                                                                          
2022-12-20 23:19:37,787 CRIT Server 'inet_http_server' running without any HTTP authentication checking                                                                      
2022-12-20 23:19:37,787 INFO RPC interface 'supervisor' initialized                                                                                                          
2022-12-20 23:19:37,788 CRIT Server 'unix_http_server' running without any HTTP authentication checking                                                                      
2022-12-20 23:19:37,788 INFO supervisord started with pid 1

so olny one daemon running on port port=:61000 so no problem with binding to new port 61001

when back to

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[rpcinterface:supvisors]
supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface

then problem with binding back again:

2022-12-20 23:20:37,159;INFO;SupervisorListener.on_running: local supervisord is RUNNING
2022-12-20 23:20:37,159;CRIT;PublisherServer._bind: failed to bind the Supvisors publisher on port 61001
2022-12-20 23:20:37,159;INFO;FiniteStateMachine.set_state: Supvisors in INITIALIZATION 
2022-12-20 23:20:37,159;INFO;Context.master_identifier:

when comment out orginal supervisor xmlrpc and leave only [rpcinterface:supvisors] problem with binding also exists :(

unfortunately I need both of them supervisord and supvisors api (update_numprocs feature)

julien6387 commented 1 year ago

please comment these lines instead:

;[unix_http_server]
;file=/var/run/supervisor/supervisor.sock
mfederowicz commented 1 year ago

please comment these lines instead:

;[unix_http_server]
;file=/var/run/supervisor/supervisor.sock

yes yes, good point, after that logs looks good:

2022-12-20 23:40:24,360 INFO Set uid to user 0 succeeded
2022-12-20 23:40:24,361 INFO RPC interface 'supervisor' initialized
2022-12-20 23:40:24,362;WARN;SupervisorData.read_disabilities: no persistence for program disabilities
2022-12-20 23:40:24,362;INFO;SupvisorsMapper.configure: identifiers=['host1']
2022-12-20 23:40:24,362;INFO;SupvisorsMapper.find_local_identifier: local_identifier=host1
2022-12-20 23:40:24,362;INFO;SupvisorsMapper.configure: core_identifiers=[]
2022-12-20 23:40:24,362;WARN;Supvisors: cannot parse rules files: None - 'NoneType' object is not iterable
2022-12-20 23:40:24,368;INFO;RPCInterface: using Supvisors=0.15 Supervisor=4.2.4
2022-12-20 23:40:24,368;INFO;RPC interface 'supvisors' initialized
2022-12-20 23:40:24,368;CRIT;Server 'inet_http_server' running without any HTTP authentication checking
2022-12-20 23:40:24,368;INFO;supervisord started with pid 1
2022-12-20 23:40:24,368;INFO;SupervisorListener.on_running: local supervisord is RUNNING
2022-12-20 23:40:24,369;INFO;FiniteStateMachine.set_state: Supvisors in INITIALIZATION
2022-12-20 23:40:24,369;INFO;Context.master_identifier: 
2022-12-20 23:40:30,377;INFO;SupvisorsInstanceStatus.state: Supvisors=host1 is CHECKING
2022-12-20 23:40:30,386;INFO;Context.on_authorization: local Supvisors instance is authorized to work with Supvisors=host1
2022-12-20 23:40:30,386;INFO;SupvisorsInstanceStatus.state: Supvisors=host1 is RUNNING
2022-12-20 23:40:35,386;INFO;InitializationState.next: all Supvisors instances are in a known state
2022-12-20 23:40:35,386;INFO;InitializationState.exit: working with Supvisors instances ['host1']
2022-12-20 23:40:35,386;INFO;InitializationState.exit: core_identifiers=[]
2022-12-20 23:40:35,386;INFO;Context.master_identifier: host1
2022-12-20 23:40:35,386;INFO;FiniteStateMachine.set_state: Supvisors in DEPLOYMENT
2022-12-20 23:40:35,386;INFO;FiniteStateMachine.set_state: Supvisors in OPERATION

I should forgot about this section (i read about it :))

but for now i have interesting error:

supvisorsctl sstatus
error: <class 'OSError'>, [Errno 99] Cannot assign requested address: file: /usr/local/lib/python3.9/socket.py line: 832
julien6387 commented 1 year ago

interesting one indeed. never seen before. have you tried the legacy supervisorctl command ? with / without the Supvisors plugin ? I would rather suspect a network configuration issue.

julien6387 commented 1 year ago

this seems to be linked to the container itself. I've found that on the net:

This error will also appear if you try to connect to an exposed port from within a Docker container, when nothing is actively serving the port.

mfederowicz commented 1 year ago

@julien6387 of course legacy supervisorctl help works, but when try supervisorctl status, then the same error appears :(

julien6387 commented 1 year ago

ok so I don't believe that it is related to Supvisors.

you're not the first one having this issue apparently. https://stackoverflow.com/questions/71650364/supervisorctl-status-report-error-class-oserror-errno-99-cannot-assign-r

my knowledge about containers is still a bit basic but when you run the container, do you use this option ?

docker run -p 61000:61000 <image-name>
mfederowicz commented 1 year ago

ok voila:

root@host1:/# supvisorsctl -s http://localhost:61000 sstate
State           Starting  Stopping  
OPERATION       []        []

i think the problem is with reading configuration for supvisorsctl and supervisorctl dont read serverurl option, when I delared as parameter then both command works :P

mfederowicz commented 1 year ago

ok so I don't believe that it is related to Supvisors.

you're not the first one having this issue apparently. https://stackoverflow.com/questions/71650364/supervisorctl-status-report-error-class-oserror-errno-99-cannot-assign-r

my knowledge about containers is still a bit basic but when you run the container, do you use this option ?

docker run -p 61000:61000 <image-name>

I use :

docker-compose -p supvisors -f ${COMPOSE_FILE} up -d

and docker-compose.yml file with defined services but -p port option should work: https://docs.docker.com/engine/reference/commandline/run/#publish-or-expose-port--p---expose

using cli is more universal, but docker-compose is more convinient :P

mfederowicz commented 1 year ago

ok @julien6387 i think i fonded source of the problem

in supervisor default url to supervisorctl is = localhost:9001

when you set to different port then problem appears (in our case localhost:61000)

of course when you use parameter -s http://localhost:61000 there is no problem because you set value

Interesting part is when you set :

[inet_http_server] port=:9001 [supervisorctl] serverurl=http://localhost:9001

then problem with

error: <class 'OSError'>, [Errno 99] Cannot assign requested address: file: /usr/local/lib/python3.9/socket.py line: 832

dont exists, but question is how to change default value of -s parameter for ctl tools, but I think it is case for different Issue/PR

Thank You For Your Support :+1: :1st_place_medal:

mfederowicz commented 1 year ago

@julien6387 hmm interesting in supervisor repo they had https://github.com/Supervisor/supervisor/pull/1558/files and they close :(

julien6387 commented 1 year ago

when using supervisorctl, default locations for the configuration file are searched. http://supervisord.org/configuration.html#configuration-file if none is found, the default value is applied, i.e. 9001

have you tried this ?

supervisorctl -c /path/to/the/supervisord.conf sstatus
mfederowicz commented 1 year ago

@julien6387 ok for now I have stable basic configuration (for docker). In entrypoint-app.sh added:

ln -sf /etc/supervisord_host.conf /etc/supervisord.conf
exec supervisord -c /etc/supervisord_host.conf

so then if you use in container ctl script then it uses "default" path to search configuration and everything works well:

root@host1:/# supervisorctl instance_status
Supervisor  Node   Port   State      Load  Time      Counter  FSM             Starting  Stopping  
server_1    host1  60000  RUNNING    0%    11:39:55  92       OPERATION       False     False

root@host2:/# supervisorctl instance_status
Supervisor  Node   Port   State      Load  Time      Counter  FSM             Starting  Stopping  
server_2    host2  60000  RUNNING    0%    11:38:10  71       OPERATION       False     False

for now I can go further and try to test workers(programs) configurations for other hosts that I have in my production setup, and then try to manage number of instances of selected program(worker) with update_numprocs feature :)

One more time thank You For Your Support :+1: :1st_place_medal: