Closed MrSaints closed 2 years ago
can you get a stack trace of where it's hanging?
Threads:
Id Target Id Frame
2 Thread 0x7ff964431700 (LWP 7) "python" sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
* 1 Thread 0x7ff96b922700 (LWP 5) "python" sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
C Stack Trace:
#0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1 0x00007f21cfcbb3c8 in PyThread_acquire_lock (lock=0x1004290, waitflag=<optimized out>) at Python/thread_pthread.h:324
#2 0x00007f21cfcc0a24 in lock_PyThread_acquire_lock (self=0x7f21ca6adf70, args=<optimized out>) at ./Modules/threadmodule.c:52
#3 0x00007f21cfc7f2dc in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4352
#4 PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2989
#5 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21cb5ca130, globals=0x80, locals=0x0, args=0x7f21c8cd5b40, argcount=0, kws=0x6a, kwcount=0, defs=0x7f21cb5d6c68, defcount=1,
closure=0x0) at Python/ceval.c:3584
#6 0x00007f21cfc7f04a in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=<optimized out>, func=<optimized out>) at Python/ceval.c:4447
#7 call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4372
#8 PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2989
#9 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21cb5cab30, globals=0x80, locals=0x0, args=0x1, argcount=0, kws=0x6a, kwcount=0, defs=0x7f21cb5d6e28, defcount=1, closure=0x0)
at Python/ceval.c:3584
#10 0x00007f21cfc7f04a in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=<optimized out>, func=<optimized out>) at Python/ceval.c:4447
#11 call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4372
#12 PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2989
#13 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21c8d79bb0, globals=0x80, locals=0x0, args=0x7f21c8cdd388, argcount=0, kws=0x6a, kwcount=0, defs=0x0, defcount=0, closure=0x0)
at Python/ceval.c:3584
#14 0x00007f21cfc7f04a in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=<optimized out>, func=<optimized out>) at Python/ceval.c:4447
---Type <return> to continue, or q <return> to quit---
#15 call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4372
#16 PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2989
#17 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21c8d79630, globals=0x80, locals=0x0, args=0x7f21c8cced68, argcount=0, kws=0x6a, kwcount=0, defs=0x0, defcount=0, closure=0x0)
at Python/ceval.c:3584
#18 0x00007f21cfc7f04a in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=<optimized out>, func=<optimized out>) at Python/ceval.c:4447
#19 call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4372
#20 PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2989
#21 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21c8d795b0, globals=0x80, locals=locals@entry=0x0, args=0x4, argcount=0, kws=0x6a, kws@entry=0x7f21c8d398d8, kwcount=4,
defs=0x7f21c8cfca28, defcount=1, closure=0x0) at Python/ceval.c:3584
#22 0x00007f21cfbf9845 in function_call (func=0x7f21c8d76cf8, arg=0x7f21c9515410, kw=0x7f21c8cd95c8) at Objects/funcobject.c:523
#23 0x00007f21cfbc7863 in PyObject_Call (func=0x7f21c8d76cf8, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2547
#24 0x00007f21cfbd646d in instancemethod_call (func=0x1004290, arg=0x7f21d0169050, kw=0x7f21c8cd95c8) at Objects/classobject.c:2602
#25 0x00007f21cfbc7863 in PyObject_Call (func=func@entry=0x7f21ca12e050, arg=arg@entry=0x7f21d0169050, kw=kw@entry=0x7f21c8cd95c8) at Objects/abstract.c:2547
#26 0x00007f21cfc355ed in slot_tp_init (self=<optimized out>, args=0x7f21d0169050, kwds=0x7f21c8cd95c8) at Objects/typeobject.c:5806
#27 0x00007f21cfc31e1f in type_call (type=<optimized out>, args=0x7f21d0169050, kwds=0x7f21c8cd95c8) at Objects/typeobject.c:765
#28 0x00007f21cfbc7863 in PyObject_Call (func=0xfffcc0, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2547
#29 0x00007f21cfc7ee79 in do_call (nk=<optimized out>, na=<optimized out>, pp_stack=<optimized out>, func=<optimized out>) at Python/ceval.c:4569
#30 call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4374
---Type <return> to continue, or q <return> to quit---
#31 PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2989
#32 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21c9514d30, globals=0x80, locals=0x0, args=0x2, argcount=0, kws=0x6a, kwcount=1, defs=0x0, defcount=0, closure=0x0)
at Python/ceval.c:3584
#33 0x00007f21cfc7f04a in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=<optimized out>, func=<optimized out>) at Python/ceval.c:4447
#34 call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4372
#35 PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2989
#36 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21c8d790b0, globals=0x80, locals=0x0, args=0x1, argcount=0, kws=0x6a, kwcount=0, defs=0x7f21c8d73d68, defcount=1, closure=0x0)
at Python/ceval.c:3584
#37 0x00007f21cfc7f04a in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=<optimized out>, func=<optimized out>) at Python/ceval.c:4447
#38 call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4372
#39 PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2989
#40 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21c9516130, globals=0x80, globals@entry=0x7f21c9504d70, locals=0x0, locals@entry=0x7f21c9504d70, args=0x0,
argcount=argcount@entry=0, kws=0x6a, kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#41 0x00007f21cfc83329 in PyEval_EvalCode (co=co@entry=0x7f21c9516130, globals=globals@entry=0x7f21c9504d70, locals=locals@entry=0x7f21c9504d70) at Python/ceval.c:669
#42 0x00007f21cfc970b9 in PyImport_ExecCodeModuleEx (name=0xfb3ff0 "main", co=0x7f21c9516130, pathname=<optimized out>) at Python/import.c:731
#43 0x00007f21cfc9740e in load_source_module (name=0xfb3ff0 "main", pathname=0xfb6010 "/pythia-stateless/main.pyc", fp=0x7f21cff93aac <Py_VerboseFlag>) at Python/import.c:1121
#44 0x00007f21cfc981f9 in import_submodule (mod=0x7f21cff48b60 <_Py_NoneStruct>, subname=0xfb3ff0 "main", fullname=0xfb3ff0 "main") at Python/import.c:2725
#45 0x00007f21cfc98ea2 in load_next (p_buflen=<synthetic pointer>, buf=0xfb3ff0 "main", p_name=<synthetic pointer>, altmod=0x7f21cff48b60 <_Py_NoneStruct>,
---Type <return> to continue, or q <return> to quit---
mod=0x7f21cff48b60 <_Py_NoneStruct>) at Python/import.c:2539
#46 import_module_level (locals=<optimized out>, level=<optimized out>, fromlist=0x7f21cff48b60 <_Py_NoneStruct>, globals=<optimized out>, name=<optimized out>)
at Python/import.c:2247
#47 PyImport_ImportModuleLevel (name=0x7f21d014a654 "main", globals=<optimized out>, locals=<optimized out>, fromlist=0x7f21cff48b60 <_Py_NoneStruct>, level=<optimized out>)
at Python/import.c:2312
#48 0x00007f21cfc77488 in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
#49 0x00007f21cfbc7863 in PyObject_Call (func=func@entry=0x7f21d019cfc8, arg=arg@entry=0x7f21c950f5d0, kw=<optimized out>) at Objects/abstract.c:2547
#50 0x00007f21cfc78f87 in PyEval_CallObjectWithKeywords (func=0x7f21d019cfc8, arg=0x7f21c950f5d0, kw=<optimized out>) at Python/ceval.c:4221
#51 0x00007f21cfc7dc2c in PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2624
#52 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21c9514c30, globals=0x80, globals@entry=0x7f21c9504b40, locals=0x0, locals@entry=0x7f21c9504b40, args=0x0,
argcount=argcount@entry=0, kws=0x6a, kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#53 0x00007f21cfc83329 in PyEval_EvalCode (co=co@entry=0x7f21c9514c30, globals=globals@entry=0x7f21c9504b40, locals=locals@entry=0x7f21c9504b40) at Python/ceval.c:669
#54 0x00007f21cfc970b9 in PyImport_ExecCodeModuleEx (name=0xc9dd80 "utils", co=0x7f21c9514c30, pathname=<optimized out>) at Python/import.c:731
#55 0x00007f21cfc9740e in load_source_module (name=0xc9dd80 "utils", pathname=0xcb2e40 "/pythia-stateless/utils.pyc", fp=0x7f21cff93aac <Py_VerboseFlag>) at Python/import.c:1121
#56 0x00007f21cfc981f9 in import_submodule (mod=0x7f21cff48b60 <_Py_NoneStruct>, subname=0xc9dd80 "utils", fullname=0xc9dd80 "utils") at Python/import.c:2725
#57 0x00007f21cfc98ea2 in load_next (p_buflen=<synthetic pointer>, buf=0xc9dd80 "utils", p_name=<synthetic pointer>, altmod=0x7f21cff48b60 <_Py_NoneStruct>,
mod=0x7f21cff48b60 <_Py_NoneStruct>) at Python/import.c:2539
#58 import_module_level (locals=<optimized out>, level=<optimized out>, fromlist=0x7f21cff48b60 <_Py_NoneStruct>, globals=<optimized out>, name=<optimized out>)
---Type <return> to continue, or q <return> to quit---
at Python/import.c:2247
#59 PyImport_ImportModuleLevel (name=0x7f21cdbffa44 "utils", globals=<optimized out>, locals=<optimized out>, fromlist=0x7f21cff48b60 <_Py_NoneStruct>, level=<optimized out>)
at Python/import.c:2312
#60 0x00007f21cfc77488 in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
#61 0x00007f21cfbc7863 in PyObject_Call (func=func@entry=0x7f21d019cfc8, arg=arg@entry=0x7f21c950f470, kw=<optimized out>) at Objects/abstract.c:2547
#62 0x00007f21cfc78f87 in PyEval_CallObjectWithKeywords (func=0x7f21d019cfc8, arg=0x7f21c950f470, kw=<optimized out>) at Python/ceval.c:4221
#63 0x00007f21cfc7dc2c in PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2624
#64 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21c9514a30, globals=0x80, globals@entry=0x7f21c9504a28, locals=0x0, locals@entry=0x7f21c9504a28, args=0x0,
argcount=argcount@entry=0, kws=0x6a, kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#65 0x00007f21cfc83329 in PyEval_EvalCode (co=co@entry=0x7f21c9514a30, globals=globals@entry=0x7f21c9504a28, locals=locals@entry=0x7f21c9504a28) at Python/ceval.c:669
#66 0x00007f21cfc970b9 in PyImport_ExecCodeModuleEx (name=0xb19630 "models", co=0x7f21c9514a30, pathname=<optimized out>) at Python/import.c:731
#67 0x00007f21cfc9740e in load_source_module (name=0xb19630 "models", pathname=0xbadb70 "/pythia-stateless/models.pyc", fp=0x7f21cff93aac <Py_VerboseFlag>) at Python/import.c:1121
#68 0x00007f21cfc981f9 in import_submodule (mod=0x7f21cff48b60 <_Py_NoneStruct>, subname=0xb19630 "models", fullname=0xb19630 "models") at Python/import.c:2725
#69 0x00007f21cfc98ea2 in load_next (p_buflen=<synthetic pointer>, buf=0xb19630 "models", p_name=<synthetic pointer>, altmod=0x7f21cff48b60 <_Py_NoneStruct>,
mod=0x7f21cff48b60 <_Py_NoneStruct>) at Python/import.c:2539
#70 import_module_level (locals=<optimized out>, level=<optimized out>, fromlist=0x7f21cff48b60 <_Py_NoneStruct>, globals=<optimized out>, name=<optimized out>)
at Python/import.c:2247
#71 PyImport_ImportModuleLevel (name=0x7f21c94febf4 "models", globals=<optimized out>, locals=<optimized out>, fromlist=0x7f21cff48b60 <_Py_NoneStruct>, level=<optimized out>)
---Type <return> to continue, or q <return> to quit---
at Python/import.c:2312
#72 0x00007f21cfc77488 in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
#73 0x00007f21cfbc7863 in PyObject_Call (func=func@entry=0x7f21d019cfc8, arg=arg@entry=0x7f21c950f050, kw=<optimized out>) at Objects/abstract.c:2547
#74 0x00007f21cfc78f87 in PyEval_CallObjectWithKeywords (func=0x7f21d019cfc8, arg=0x7f21c950f050, kw=<optimized out>) at Python/ceval.c:4221
#75 0x00007f21cfc7dc2c in PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2624
#76 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21c950c930, globals=0x80, globals@entry=0x7f21c94fdd70, locals=0x0, locals@entry=0x7f21c94fdd70, args=0x0,
argcount=argcount@entry=0, kws=0x6a, kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#77 0x00007f21cfc83329 in PyEval_EvalCode (co=co@entry=0x7f21c950c930, globals=globals@entry=0x7f21c94fdd70, locals=locals@entry=0x7f21c94fdd70) at Python/ceval.c:669
#78 0x00007f21cfc970b9 in PyImport_ExecCodeModuleEx (name=0xc95430 "document_search", co=0x7f21c950c930, pathname=<optimized out>) at Python/import.c:731
#79 0x00007f21cfc9740e in load_source_module (name=0xc95430 "document_search", pathname=0x8de600 "/pythia-stateless/document_search.pyc", fp=0x7f21cff93aac <Py_VerboseFlag>)
at Python/import.c:1121
#80 0x00007f21cfc981f9 in import_submodule (mod=0x7f21cff48b60 <_Py_NoneStruct>, subname=0xc95430 "document_search", fullname=0xc95430 "document_search") at Python/import.c:2725
#81 0x00007f21cfc98ea2 in load_next (p_buflen=<synthetic pointer>, buf=0xc95430 "document_search", p_name=<synthetic pointer>, altmod=0x7f21cff48b60 <_Py_NoneStruct>,
mod=0x7f21cff48b60 <_Py_NoneStruct>) at Python/import.c:2539
#82 import_module_level (locals=<optimized out>, level=<optimized out>, fromlist=0x7f21cff48b60 <_Py_NoneStruct>, globals=<optimized out>, name=<optimized out>)
at Python/import.c:2247
#83 PyImport_ImportModuleLevel (name=0x7f21d00644b4 "document_search", globals=<optimized out>, locals=<optimized out>, fromlist=0x7f21cff48b60 <_Py_NoneStruct>,
level=<optimized out>) at Python/import.c:2312
---Type <return> to continue, or q <return> to quit---
#84 0x00007f21cfc77488 in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
#85 0x00007f21cfbc7863 in PyObject_Call (func=func@entry=0x7f21d019cfc8, arg=arg@entry=0x7f21d0052998, kw=<optimized out>) at Objects/abstract.c:2547
#86 0x00007f21cfc78f87 in PyEval_CallObjectWithKeywords (func=0x7f21d019cfc8, arg=0x7f21d0052998, kw=<optimized out>) at Python/ceval.c:4221
#87 0x00007f21cfc7dc2c in PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2624
#88 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21d004c830, globals=0x80, locals=0x0, args=0x0, argcount=argcount@entry=0, kws=0x6a, kws@entry=0x0, kwcount=0, defs=0x0,
defcount=0, closure=0x0) at Python/ceval.c:3584
#89 0x00007f21cfc83329 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at Python/ceval.c:669
#90 0x00007f21cfca7a4a in run_mod (arena=<optimized out>, flags=<optimized out>, locals=<optimized out>, globals=<optimized out>, filename=<optimized out>, mod=<optimized out>)
at Python/pythonrun.c:1376
#91 PyRun_FileExFlags (fp=0x83bbf0, filename=0x7f21d004c830 "\002", start=0, globals=0x7f21d013e168, locals=0x7f21d013e168, closeit=1, flags=0x7fff02fc6530)
at Python/pythonrun.c:1362
#92 0x00007f21cfca8fd7 in PyRun_SimpleFileExFlags (fp=0x1004290, fp@entry=0x83bbf0, filename=0x7fff02fc7ddc "main.py", closeit=1, flags=0x7f21cf964010 <sem_wait+48>,
flags@entry=0x7fff02fc6530) at Python/pythonrun.c:948
#93 0x00007f21cfca94f3 in PyRun_AnyFileExFlags (fp=fp@entry=0x83bbf0, filename=<optimized out>, closeit=<optimized out>, flags=flags@entry=0x7fff02fc6530) at Python/pythonrun.c:752
#94 0x00007f21cfcbef95 in Py_Main (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:640
#95 0x00007f21ceec4b45 in __libc_start_main (main=0x4006c0 <main>, argc=2, argv=0x7fff02fc66e8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
stack_end=0x7fff02fc66d8) at libc-start.c:287
#96 0x00000000004006ee in _start ()
I wasn't able to get the Python stack trace unfortunately.
can you share a minimum example to reproduce?
I'm encountering this same issue when trying to integrate an application running under tornado in a docker container but as a WSGIApplication (so no async allowed) . I also wasn't able to get a stack trace, but the issue appears to be in local_agent_net.py::_create_new_thread_loop() Seems to be that things lock up as soon as self._thread_loop.is_ready() is called
tornado is used by jaeger_client internally. We have it working with WSGI services just fine, but you need to make sure the tracer is initialized post-fork.
I have seen the locking issue before due to the way our of our internal framework is structured, namely that it performs initialization as part of an import
. Import take an exclusive lock on the interpreter and if during the execution another thread is created that does more imports, you can get into a deadlock.
Do you have a reproducable example?
It's completely reproducible in the context of our full production system. I tried to distill it down to a trivial example that is ordered in the same way using a tornado WSGIApplication, but in that example the deadlock doesn't happen. As far as in can tell, the structure of my toy example is essentially the same as our overall production code, so I'm not clear why it doesn't have the same issue. How would I ensure that I'm initializing the tracer post-fork? I'm going to try messing around with uwsgi's @postfork decorator.
Yes, in our internal lib that wraps WSGI we're using @postfork
decorator to init the tracer and middleware.
https://docs.python.org/2/library/threading.html#importing-in-threaded-code
Firstly, other than in the main module, an import should not have the side effect of spawning a new thread and then waiting for that thread in any way. Failing to abide by this restriction can lead to a deadlock if the spawned thread directly or indirectly attempts to import a module.
The trouble is in a lot of cases, jaeger_client
will be initialized as part of importing a module. And then jaeger_client
starts a thread and waits for it (indirectly: threadloop
is directly responsible), deadlocking as described above. For example django-opentracing suggests configuring through Django's settings.py, which is imported by whatever WSGI server is being used.
Configuring jaeger from __main__
works fine.
$ cat >foo.py
from jaeger_client import Config
config = Config(config={'logging': True}, service_name='postal-main')
tracer = config.initialize_tracer()
$ python foo.py
<jaeger_client.tracer.Tracer object at 0x7f0996b1e490>
But try to import that module, and you get a deadlock:
$ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import foo
[deadlock]
@bitglue I added a section to the README, does it help?
I think there may be two problems here.
One, starting jaeger and then forking leads to problems. I'd guess it's because the children will be running the same event loop, on the same file descriptors.
The other problem is this deadlock. As my case above illustrates, it's not necessary to fork anything to get this deadlock: all that's required is to import something which tries to initialize jaeger.
With django in particular this is problematic: opentracing-contrib/python-django configures middleware (distinct from WSGI middleware) in a settings file: I don't believe there's a way to add middleware post-fork. The settings file is a module that gets imported. The middleware takes a tracer as a parameter. Obtaining a tracer implicitly starts jager_client's event loop. Thus it's not really possible to configure the django middleware to use jager without deadlocking.
I'm in the same situation as @bitglue describes with Django / gunicorn. I guess because of django works, using gunicorn's post_fork
might not be an option. Is there a workaround for django + gunicorn, or it's just not compatible?
I was able to work around the issue by extending the opentracing-contrib/python-django to handle the case of not yet having a tracer:
class LazyOpenTracingMiddleware(OpenTracingMiddleware):
'''Opentracing middleware which evaluates the tracer lazily.
In part due to
https://github.com/uber/jaeger-client-python/issues/60, and further because
it's just a good idea to avoid issues like unintentionally shared file
descriptors between processes, we need to initialize the jaeger tracer
after the gunicorn fork. Since OpenTracingMiddleware is configured in
MIDDLEWARE_CLASSES in sites/postmates/settings/base.py, which is imported
pre-fork, this presents a problem.
This class modifies OpenTracingMiddleware such that it will do nothing
until a global tracer exists at `opentracing.tracer`.
'''
def __init__(self):
'''Do nothing.
OpenTracingMiddleware.__init__ only sets self._tracer, which we don't
want to do.
'''
@property
def _tracer(self):
'''Get a DjangoTracer, or None if opentracing not initialized.
Once opentracing is initialized (indicated by the presence of
`opentracing.tracer`), will return the same instance of DjangoTracer in
perpetuity.
'''
try:
return self._django_tracer
except AttributeError:
pass
try:
tracer = opentracing.tracer
except AttributeError:
return None
self._django_tracer = DjangoTracer(tracer)
return self._tracer
def process_view(self, *a, **kw):
if self._tracer is None:
return
OpenTracingMiddleware.process_view(self, *a, **kw)
def process_response(self, request, response):
if self._tracer is None:
return response
return OpenTracingMiddleware.process_response(self, request, response)
And then I deferred the initialization of the tracer to gunicorn's post_fork hook:
def post_fork(server, worker):
config = jaeger_client.Config(
config={
'sampler': {'type': 'const', 'param': 1},
'logging': True,
},
service_name='postal-main')
config.initialize_tracer() # sets opentracing.tracer global
Pretty hairy, but gets the job done.
@bitglue Awesome, thanks for that!
@bitglue what is your suggestion now to make it easier for the users? Should this lazy class be made a part of opentracing-contrib/python-django
? If so, do you want to add a PR?
If I could get a tracer object but not start the event loop, that would solve it elegantly. Then the event loop can be controlled with tracer.start_event_loop()
and tracer.stop_event_loop()
or such.
For backwards compatibility perhaps initialize_tracer(start_event_loop=False)
would suffice to request the new behavior?
I'm not a fan of my LazyOpenTracingMiddleware
solution since it relies on communicating through the opentracing.tracer
global.
I'm not a fan of my LazyOpenTracingMiddleware solution since it relies on communicating through the opentracing.tracer global.
that's what we ended up doing internally as well. Has anyone taken a look at lightstep python client? why didn't it have a problem with django? I assume it also uses a bg thread for reporting.
On a very cursory read, it doesn't appear to wait for the thread to become ready after starting it. That would explain why it does not deadlock.
Not sure about the consequences of forking after starting the thread. It's possible if it's using blocking IO instead of epoll(), and some care is taken with the file descriptors to not have two processes on the same TCP connection (or there are no TCP connections?), it works out fine.
👍 I'm bumping into a similar deadlock working with flask, and looking into post-forking options. Thanks for writing this issue, it's been a big help.
I'm running the latest version of the client. When
initialize_tracer()
is called, it gets stuck / hangs atInitializing Jaeger Tracer with UDP reporter
.I'm running the agent in another Docker container that is bound to
localhost
ports, so it should be able to connect (I even changed the reporting host to my internal IP, rather thanlocalhost
).EDIT: So, this is for my web app, it works when running in
gunicorn
.