logdna / python

A python package for sending logs to LogDNA
MIT License
48 stars 35 forks source link

Running IPython shell in Django hangs permanently with recent versions of logdna #105

Closed JackAtOmenApps closed 1 year ago

JackAtOmenApps commented 1 year ago

The Problem

I've spent half the day trying to identify an issue, and it looks like it may be due to logdna. I see that there are similar issues in #103 and #79.

For us, django itself is working, but running ipython via the django shell or shell_plus (with django_extensions) commands hangs forever until killing with CTRL+c.

My setup:


Example Error Traces

Running either of these:

docker compose -f docker-compose.remote.yml exec django python manage.py shell (default is set to ipython) docker compose -f docker-compose.remote.yml exec django python manage.py shell -i ipython

Results in:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 656, in get
    value = obj._trait_values[self.name]
KeyError: 'log'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/manage.py", line 31, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python3.10/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python3.10/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python3.10/site-packages/django/core/management/base.py", line 412, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/lib/python3.10/site-packages/django/core/management/base.py", line 458, in execute
    output = self.handle(*args, **options)
  File "/usr/local/lib/python3.10/site-packages/django/core/management/commands/shell.py", line 136, in handle
    return getattr(self, shell)(options)
  File "/usr/local/lib/python3.10/site-packages/django/core/management/commands/shell.py", line 50, in ipython
    start_ipython(argv=[])
  File "/usr/local/lib/python3.10/site-packages/IPython/__init__.py", line 129, in start_ipython
    return launch_new_instance(argv=argv, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/traitlets/config/application.py", line 1042, in launch_instance
    app.initialize(argv)
  File "/usr/local/lib/python3.10/site-packages/traitlets/config/application.py", line 113, in inner
    return method(app, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/IPython/terminal/ipapp.py", line 270, in initialize
    super(TerminalIPythonApp, self).initialize(argv)
  File "/usr/local/lib/python3.10/site-packages/traitlets/config/application.py", line 113, in inner
    return method(app, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/IPython/core/application.py", line 476, in initialize
    self.parse_command_line(argv)
  File "/usr/local/lib/python3.10/site-packages/traitlets/config/application.py", line 113, in inner
    return method(app, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/traitlets/config/application.py", line 878, in parse_command_line
    loader = self._create_loader(argv, aliases, flags, classes=classes)
  File "/usr/local/lib/python3.10/site-packages/traitlets/config/application.py", line 789, in _create_loader
    argv, aliases, flags, classes=classes, log=self.log, subcommands=self.subcommands
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 703, in __get__
    return self.get(obj, cls)
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 677, in get
    obj._notify_observers(
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 1564, in _notify_observers
    c(event)
  File "/usr/local/lib/python3.10/site-packages/traitlets/config/application.py", line 282, in _observe_logging_default
    self._configure_logging()
  File "/usr/local/lib/python3.10/site-packages/traitlets/config/application.py", line 287, in _configure_logging
    dictConfig(config)
  File "/usr/local/lib/python3.10/logging/config.py", line 811, in dictConfig
    dictConfigClass(config).configure()
  File "/usr/local/lib/python3.10/logging/config.py", line 538, in configure
    _clearExistingHandlers()
  File "/usr/local/lib/python3.10/logging/config.py", line 275, in _clearExistingHandlers
    logging.shutdown(logging._handlerList[:])
  File "/usr/local/lib/python3.10/logging/__init__.py", line 2183, in shutdown
    h.close()
  File "/usr/local/lib/python3.10/site-packages/logdna/logdna.py", line 352, in close
    self.request_thread_pool.shutdown(wait=True)
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 235, in shutdown
    t.join()
  File "/usr/local/lib/python3.10/threading.py", line 1096, in join
    self._wait_for_tstate_lock()
  File "/usr/local/lib/python3.10/threading.py", line 1116, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
KeyboardInterrupt

Running this:

docker compose -f docker-compose.remote.yml exec django python manage.py shell_plus

Results in:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 656, in get
    value = obj._trait_values[self.name]
KeyError: 'log'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/manage.py", line 31, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python3.10/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python3.10/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python3.10/site-packages/django_extensions/management/commands/shell_plus.py", line 125, in run_from_argv
    return super().run_from_argv(argv)
  File "/usr/local/lib/python3.10/site-packages/django/core/management/base.py", line 412, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/lib/python3.10/site-packages/django/core/management/base.py", line 458, in execute
    output = self.handle(*args, **options)
  File "/usr/local/lib/python3.10/site-packages/django_extensions/management/utils.py", line 62, in inner
    ret = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/django_extensions/management/commands/shell_plus.py", line 582, in handle
    runner()
  File "/usr/local/lib/python3.10/site-packages/django_extensions/management/commands/shell_plus.py", line 404, in run_ipython
    start_ipython(argv=ipython_arguments, user_ns=imported_objects)
  File "/usr/local/lib/python3.10/site-packages/IPython/__init__.py", line 129, in start_ipython
    return launch_new_instance(argv=argv, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/traitlets/config/application.py", line 1042, in launch_instance
    app.initialize(argv)
  File "/usr/local/lib/python3.10/site-packages/traitlets/config/application.py", line 113, in inner
    return method(app, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/IPython/terminal/ipapp.py", line 270, in initialize
    super(TerminalIPythonApp, self).initialize(argv)
  File "/usr/local/lib/python3.10/site-packages/traitlets/config/application.py", line 113, in inner
    return method(app, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/IPython/core/application.py", line 476, in initialize
    self.parse_command_line(argv)
  File "/usr/local/lib/python3.10/site-packages/traitlets/config/application.py", line 113, in inner
    return method(app, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/traitlets/config/application.py", line 878, in parse_command_line
    loader = self._create_loader(argv, aliases, flags, classes=classes)
  File "/usr/local/lib/python3.10/site-packages/traitlets/config/application.py", line 789, in _create_loader
    argv, aliases, flags, classes=classes, log=self.log, subcommands=self.subcommands
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 703, in __get__
    return self.get(obj, cls)
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 677, in get
    obj._notify_observers(
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 1564, in _notify_observers
    c(event)
  File "/usr/local/lib/python3.10/site-packages/traitlets/config/application.py", line 282, in _observe_logging_default
    self._configure_logging()
  File "/usr/local/lib/python3.10/site-packages/traitlets/config/application.py", line 287, in _configure_logging
    dictConfig(config)
  File "/usr/local/lib/python3.10/logging/config.py", line 811, in dictConfig
    dictConfigClass(config).configure()
  File "/usr/local/lib/python3.10/logging/config.py", line 538, in configure
    _clearExistingHandlers()
  File "/usr/local/lib/python3.10/logging/config.py", line 275, in _clearExistingHandlers
    logging.shutdown(logging._handlerList[:])
  File "/usr/local/lib/python3.10/logging/__init__.py", line 2183, in shutdown
    h.close()
  File "/usr/local/lib/python3.10/site-packages/logdna/logdna.py", line 352, in close
    self.request_thread_pool.shutdown(wait=True)
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 235, in shutdown
    t.join()
  File "/usr/local/lib/python3.10/threading.py", line 1096, in join
    self._wait_for_tstate_lock()
  File "/usr/local/lib/python3.10/threading.py", line 1116, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
KeyboardInterrupt

Both of the traces above are the same with the exception of lines specific to use of the shell vs shell_plus commands.

Running either of these, which use a plain python shell rather than ipython, works fine:

docker compose -f docker-compose.remote.yml exec django python manage.py shell -i python docker compose -f docker-compose.remote.yml exec django python manage.py shell_plus --plain

Also, running ipython on its own works fine:

docker compose -f docker-compose.remote.yml exec django ipython --version outputs 8.14.0 docker compose -f docker-compose.remote.yml exec django ipython --log-level=DEBUG Runs ipython via the django container (but not incorporating the django project itself) just fine


More details

I did a bunch of troubleshooting looking at recent code changes and issues in the traitlets, ipython, django packages and others. Eventually I narrowed it down to a logging issue.

Removed all logging by setting LOGGING_CONFIG = False and the shell works just fine.

Began adding logging configuration to django's LOGGING dict bit by bit until I hit upon the issue, which appears to be the moment I add logdna back to the mix. To be clear, I've been using logdna/mezmo for a long time without issues, so I'm confident the issue isn't in my logging config, but in a recent change to the logdna package.

My logging config in django settings:

from logdna import (
    LogDNAHandler,
)  # required to register `logging.handlers.LogDNAHandler`

# LOGGING
# ------------------------------------------------------------------------------
# https://docs.djangoproject.com/en/dev/ref/settings/#logging

LOGGING = {
    "version": 1,
    "disable_existing_loggers": True,
    "formatters": {
        "verbose": {
            "format": "{asctime}  Logger: {name}, File: {filename}:{lineno}, Message: {message}",
            "style": "{",
        },
        "colored_verbose": {
            "()": "colorlog.ColoredFormatter",
            "format": "{asctime}  Logger:{green}{name}{reset} {red}{filename}:{lineno}{reset} {blue}{message}",
            "style": "{",
        },
    },
    "handlers": {
        "console": {
            "level": "DEBUG",
            "class": "logging.StreamHandler",
            "formatter": "verbose",
        },
        "logdna": {
            "level": logging.DEBUG,
            "formatter": "colored_verbose",
            "class": "logging.handlers.LogDNAHandler",
            "key": env.str("LOGDNA_KEY"),
            "options": {
                "app": "django",
                "hostname": f"{HOSTNAME}",
                "env": HOSTNAME,
                "index_meta": False,
            },
        },
    },
    "root": {
        "level": "DEBUG",
        "handlers": [
            "logdna",
            "console",
        ],
    },
    "loggers": {
        "": {
            "level": "DEBUG",
            "handlers": [
                "logdna",
                "console",
            ],
        },
    },
}

I'm curious is anyone else has experienced these symptoms. I hadn't noticed there was a problem until I tried to get into the shell this morning. None of my existing tests had highlighted this problem. I guess I should add one that runs the shell command when I make updates to dependencies 🤔

JackAtOmenApps commented 1 year ago

Update: I saw in these comments that suggest that version 1.18.8 probably wouldn't fix my issue and 1.18.7 has its own issues. So I reverted to 1.18.6, which seems to be working alright. The shell and shell_plus commands now work as expected and I'm not noticing any other issues yet. So it seems changes in 1.18.7/1.18.8/1.18.9 are at fault.

JackAtOmenApps commented 1 year ago

Well, I may be just going insane at this point.

Everything was great with 1.18.6 for the past 8 hours. And now I get essentially the same error trace.

Very sad. Going to sleep.

shawkinsmezmo commented 1 year ago

Hi @OmenApps. Thanks for raising this. We are actively working on a fix for this internally. Thank you for your patience. We will update the issue when we are ready for you to test the new package.

auirarrazaval commented 1 year ago

@OmenApps You are not alone. After maybe days of trying to figure out why all our Django deployments died all of a sudden, and isolating multiple libraries until the error was isolated, I realized that it was this library the one responsible. I'll share my configuration as well here.

@shawkinsmezmo Sorry to duplicate the issue and post in this one instead of #79 , but this is more ad-hoc to the problem we are facing.

This is how the logging section of our configuration file looks like:


LOGGING = {
    "version": 1,
    "disable_existing_loggers": False,
    "handlers": {
        "logdna": {
            "level": "DEBUG",
            "class": "logdna.LogDNAHandler",
            "key": os.environ.get("LOGDNA_INGESTION_KEY"),
            "options": {
                "app": "...",
                "hostname": f"...",
                "env": get_secret("ENVIRONMENT"),
                "index_meta": True,
                "tags": ["..."],
            },
        },
        "console": {
            "level": "DEBUG",
            "class": "logging.StreamHandler",
        },
    },
    "loggers": {
        "logdna": {"handlers": ["logdna", "console"], "propagate": True},
    },
}

I have to say also that two days ago I had the exact same error as @OmenApps apps above (threading/lock). But today, upon trying again I couldn't get that error, but any django command that we run still hangs

shawkinsmezmo commented 1 year ago

@auirarrazaval We've identified that there is a problem with the recent changes that specifically affects Django, and possibly other third party libraries / frameworks that instantiate and close multiple instances of the same log handler class during the course of an application's lifecycle. We have a PR that we are testing / promoting internally that should address this. I apologize for the inconvenience and appreciate your patience while we sort this out. Part of this change will include beefing up our internal testing to make sure that we cover additional usages, such as Django and other common / popular middlewares.

shawkinsmezmo commented 1 year ago

Hi @OmenApps, @auirarrazaval - Can you give the latest package a try and let me know if it resolves your issues? Thanks!

auirarrazaval commented 1 year ago

@shawkinsmezmo I have just tried against v1.18.11 and manage.py runserver is still hanging in Django

shawkinsmezmo commented 1 year ago

@auirarrazaval Interesting. I was able to reproduce this with the help of your configuration. Thanks for sharing that. Currently debugging things and will report back when I have a fix.

auirarrazaval commented 1 year ago

@shawkinsmezmo Thanks! I wish I could give better insights, but right now it just seems to hang with no messages (i even tried setting a high verbosity with Django with no success) or tracebacks, so there's not much info. Maybe telling you my environment will help: OS: Ubuntu 20.04 Python: 3.9.16

And here is my pip freeze:

annotated-types==0.5.0
asgiref==3.7.2
attrs==23.1.0
boto3==1.28.12
botocore==1.31.12
certifi==2023.7.22
cfgv==3.3.1
charset-normalizer==3.2.0
distlib==0.3.7
Django==4.2.3
django-cors-headers==4.2.0
djangorestframework==3.14.0
drf-yasg==1.21.7
filelock==3.12.2
fuzzywuzzy==0.18.0
identify==2.5.26
idna==3.4
inflection==0.5.1
jmespath==1.0.1
jsonschema==4.18.4
jsonschema-specifications==2023.7.1
Levenshtein==0.21.1
logdna==1.18.11
nodeenv==1.8.0
opensearch-py==2.2.0
packaging==23.1
platformdirs==3.9.1
pre-commit==3.3.3
pydantic==2.1.1
pydantic_core==2.4.0
python-dateutil==2.8.2
python-decouple==3.8
python-Levenshtein==0.21.1
pytz==2023.3
PyYAML==6.0.1
rapidfuzz==3.1.2
referencing==0.30.0
requests==2.31.0
rpds-py==0.9.2
s3transfer==0.6.1
sentry-sdk==1.28.1
six==1.16.0
sqlparse==0.4.4
typing_extensions==4.7.1
Unidecode==1.3.6
uritemplate==4.1.1
urllib3==1.26.16
uWSGI==2.0.18
virtualenv==20.24.2

If there's anything that I could do/try on my end I will gladly help

JackAtOmenApps commented 1 year ago

Yep, same here. v1.18.11 still hangs. Rolling back again to v1.18.6 for the time being.

shawkinsmezmo commented 1 year ago

@OmenApps @auirarrazaval OK. I was able to trace the issue to a single line of code, based on the config that you provided. I'm working on shepherding another fix out. I'll update when it's out. Thanks for your patience guys.

shawkinsmezmo commented 1 year ago

@OmenApps @auirarrazaval Can you guys give version 1.18.12 a try and let me know how it goes? I think I found the root cause of all of these issues, and it's that the logdna handler implementation was overwriting a variable of the base logging.Handler class

JackAtOmenApps commented 1 year ago

@shawkinsmezmo So far so good. With v1.18.12 I am able to use the shell and so far everything seems to be working correctly.

auirarrazaval commented 1 year ago

@shawkinsmezmo Same here! I was able to run the server locally without any issues. Tomorrow I hope to be doing a staging deployment and keep you updated

auirarrazaval commented 1 year ago

@shawkinsmezmo seems to be working fine in deployments as well. Thanks for your help!

shawkinsmezmo commented 1 year ago

@auirarrazaval glad to hear it! I'll keep this issue open for another day or so in case folks run into any more issues, but otherwise I'm feeling optimistic that we finally have this licked.

shawkinsmezmo commented 1 year ago

Closing this issue as resolved.