Closed lukeamcmurray closed 4 years ago
Hi - thanks for reporting, I’ll take a look at this as soon as I can :)
Hi - I've got to admit, I've never used Gunicorn before so while I can reproduce the behaviour you're seeing I have no idea why it's happening :-(
I'll keep looking.
I suspect it's something to do with the way Gunicorn overrides logging handlers for child processes but it may take me a while to figure out exactly how.
Ah, it looks like Gunicorn uses fork()
. I have a feeling that doesn't play nicely with threads.
Any luck on this one? For the time being i'm using a GELF sidecar but it does make the logging setup a bit more complicated than when using SeqLog.
Hey - sorry I forgot to post a response! There is a callback that Gunicorn invokes for each worker after the call to fork()
. You can reinitialise logging from there (for some reason the logger thread state doesn’t survive the fork.
The other option is to use one of the other worker modes that doesn’t involve the use of fork()
.
Thanks and sorry for the delayed response. I've managed to get the logging configured on the worker by implementing the post_worker_init
callback method in the Gunicorn config file which re-initialises the python logging and flask app logging. It was a little tricky getting access to the Flask app in the worker in order to wire up the logging (kept getting out of context errors), so i thought i'd include a cut-down version of the final working code.
Basic directory structure:
|-- api
|-- __init__.py
|-- api.py
|-- gunicorn.py
|-- log_config_provider.py
|-- log_config.yml
api/__init__.py
:
from flask import Flask
app = Flask(__name__)
api/api.py
:
import logging
from flask import request, make_response
from api import app as application
@application.route("/api/health", methods=['GET'])
def health():
logging.info("Health check from worker")
return make_response('alive', 200)
api/gunicorn.py
:
import multiprocessing
from api.log_config_provider import (
configure_logging,
get_log_config_dict,
set_global_log_properties)
bind = "0.0.0.0:8000"
workers = (2 * multiprocessing.cpu_count()) + 1
# Configure logging
logconfig_dict = get_log_config_dict()
set_global_log_properties()
def post_worker_init(worker):
configure_logging(logconfig_dict)
api/log_config_provider.py
:
import logging
import os
import seqlog
import socket
import yaml
from api import app
from api.config import AppConfig, LoggingConfig
from api.constants import APP_CONFIG_FILE, LOG_CONFIG_FILE
def configure_logging(log_config_dict):
logging.config.dictConfig(log_config_dict)
with app.app_context():
logger = logging.getLogger('gunicorn.error')
app.logger.handlers = logger.handlers
app.logger.propagate = False
app.logger.setLevel(logger.level)
set_global_log_properties()
def get_log_config_dict():
with open(LOG_CONFIG_FILE, 'r') as log_config_file:
log_config = yaml.safe_load(log_config_file.read())
logging_config = LoggingConfig(APP_CONFIG_FILE)
transform_log_config(log_config, logging_config)
return log_config
def set_global_log_properties():
app_config = AppConfig(APP_CONFIG_FILE)
seqlog.set_global_log_properties(
ApplicationName=app_config.application_name,
Environment=app_config.environment,
MachineName=socket.gethostname(),
ProcessId=os.getpid())
def transform_log_config(log_config, logging_config):
log_config['root']['level'] = logging_config.level
log_config['loggers']['gunicorn.access']['level'] = logging_config.level
log_config['loggers']['gunicorn.error']['level'] = logging_config.level
log_config['handlers']['seq']['server_url'] = logging_config.server_url
log_config['handlers']['seq']['api_key'] = logging_config.api_key
api/log_config.yml
:
version: 1
disable_existing_loggers: True
root:
level: DEBUG
handlers:
- console
- seq
loggers:
gunicorn.access:
qualname: gunicorn.access
propagate: False
level: DEBUG
handlers:
- console
- seq
gunicorn.error:
qualname: gunicorn.error
propagate: False
level: DEBUG
handlers:
- console
- seq
handlers:
console:
class: seqlog.structured_logging.ConsoleStructuredLogHandler
formatter: standard
seq:
class: seqlog.structured_logging.SeqLogHandler
formatter: seq
server_url: 'http://localhost:5341'
api_key: ''
batch_size: 1
formatters:
seq:
style: '{'
standard:
format: '[%(asctime)s] [%(process)d] [%(levelname)s] %(message)s'
datefmt: '%Y-%m-%d %H:%M:%S'
Thank you for taking the time to write this up! I’ll add it to the docs :)
After solving the above i realised after a while that I was still missing some messages in Seq; only messages where i explicitly called the logger in a worker process would log to Seq.
If i called the python.logging
methods directly, e.g. logging.info("...")
then the logs would be written to Seq. Logs that were written by the Gunicorn framework, e.g. access logs handled by gunicorn.glogging
, were not being written to Seq.
It took me a little while to debug this but in the end i found that there appears to be 2 sets of the same loggers that exist in the worker process; one set exists on the Gunicorn worker object and the other exists on the python.logging
object. On the Gunicorn worker object the SeqLogHandler.consumer
in the gunicorn.access
and gunicorn.error
loggers was stopped for some unknown reason whilst on the python.logging
object the consumers were started. I explicitly started the consumers on the Gunicorn worker object and then the messages started appearing in Seq. Note i had to stop the consumer first again as i think the is_running
was out of sync with the consumer.is_stopped
?
I have no idea why there would be 2 different sets of loggers that exist on the gunicorn worker and the python logging objects?!
Also, currently the Gunicorn server and worker loggers are not using the StructuredLogger at all even though i call seqlog.configure_from_dict(log_config_dict)
in both server and worker processes. Another one to troubleshoot.
For now, my gunicorn config and logging provider now look like this:
gunicorn.py:
import multiprocessing
import os
from api.logging_provider import (
configure_server_logging,
configure_worker_logging,
get_log_config_dict,
set_global_log_properties)
bind = '0.0.0.0:8000'
workers = (2 * multiprocessing.cpu_count()) + 1
timeout = 10
preload_app = True
set_global_log_properties()
logconfig_dict = get_log_config_dict()
def when_ready(server):
configure_server_logging(logconfig_dict)
def post_worker_init(worker):
configure_worker_logging(worker, logconfig_dict)
logging_provider.py:
import logging
import os
import seqlog
import socket
import yaml
from api import app
from api.constants import APP_CONFIG_FILE, LOG_CONFIG_FILE
from api.utilities.app_config import AppConfig
from api.utilities.logging_config import LoggingConfig
def configure_server_logging(log_config_dict):
seqlog.configure_from_dict(log_config_dict)
def configure_worker_logging(worker, log_config_dict):
set_global_log_properties()
seqlog.configure_from_dict(log_config_dict)
with app.app_context():
logger = logging.getLogger('gunicorn.error')
app.logger.handlers = logger.handlers
app.logger.propagate = False
app.logger.setLevel(logger.level)
configure_logger(worker.log.access_log)
configure_logger(worker.log.error_log)
def configure_logger(logger):
for handler in logger.handlers:
if handler.get_name() == 'seq':
try:
# Even though the consumer is already stopped the 'is_running' flag is true
handler.consumer.stop()
except:
pass
try:
handler.consumer.start()
except:
pass
def get_log_config_dict():
with open(LOG_CONFIG_FILE, 'r') as log_config_file:
log_config_dict = yaml.safe_load(log_config_file.read())
logging_config = LoggingConfig(APP_CONFIG_FILE)
transform_log_config_dict(log_config_dict, logging_config)
return log_config_dict
def set_global_log_properties():
app_config = AppConfig(APP_CONFIG_FILE)
seqlog.set_global_log_properties(
ApplicationName=app_config.application_name,
Environment=app_config.environment,
MachineName=socket.gethostname(),
ProcessId=os.getpid())
def transform_log_config_dict(log_config_dict, logging_config):
log_config_dict['root']['level'] = logging_config.level
log_config_dict['loggers']['gunicorn.access']['level'] = logging_config.level
log_config_dict['loggers']['gunicorn.error']['level'] = logging_config.level
log_config_dict['handlers']['seq']['server_url'] = logging_config.server_url
log_config_dict['handlers']['seq']['api_key'] = logging_config.api_key
Thanks! Yeah I think they get out of sync because however the process is being forked it does not preserve thread state for some reason...
Ok - have set aside some time today and tomorrow to update and re-publish the docs for this. Will add you as a reviewer on the PR, if you don't mind 😄
Description
I am running a Flask API behind Gunicorn and am trying to get all log messages to write to both Seq and the console.
On the main Gunicorn thread the logs are successfully being written to both Seq and console however on the Gunicron worker threads the logs are only being written to the console. Whilst remote debugging in VS Code i can see that the logs generated by the worker thread are sitting in the QueueConsumer.queue however the publish_log_batch callback is not firing.
Logs from the main thread continue to be written to Seq even whilst the worker thread logs are stuck in the QueueConsumer.
What I Did
Logging config is defined in the
api/log_config.yml
file:Gunicorn config is defined in the
api/gunicorn.py
file:Flask app creation and logging setup is defined in the
api/app.py
file:I run the web service using the following command: