jaegertracing / jaeger-client-python

🛑 This library is DEPRECATED!
https://jaegertracing.io/
Apache License 2.0
408 stars 155 forks source link

Stuck / hangs at `Initializing Jaeger Tracer with UDP reporter` #60

Closed MrSaints closed 2 years ago

MrSaints commented 7 years ago

I'm running the latest version of the client. When initialize_tracer() is called, it gets stuck / hangs at Initializing Jaeger Tracer with UDP reporter.

I'm running the agent in another Docker container that is bound to localhost ports, so it should be able to connect (I even changed the reporting host to my internal IP, rather than localhost).

EDIT: So, this is for my web app, it works when running in gunicorn.

yurishkuro commented 7 years ago

can you get a stack trace of where it's hanging?

MrSaints commented 7 years ago

Threads:

  Id   Target Id         Frame
  2    Thread 0x7ff964431700 (LWP 7) "python" sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
* 1    Thread 0x7ff96b922700 (LWP 5) "python" sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85

C Stack Trace:

#0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1  0x00007f21cfcbb3c8 in PyThread_acquire_lock (lock=0x1004290, waitflag=<optimized out>) at Python/thread_pthread.h:324
#2  0x00007f21cfcc0a24 in lock_PyThread_acquire_lock (self=0x7f21ca6adf70, args=<optimized out>) at ./Modules/threadmodule.c:52
#3  0x00007f21cfc7f2dc in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4352
#4  PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2989
#5  0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21cb5ca130, globals=0x80, locals=0x0, args=0x7f21c8cd5b40, argcount=0, kws=0x6a, kwcount=0, defs=0x7f21cb5d6c68, defcount=1,
    closure=0x0) at Python/ceval.c:3584
#6  0x00007f21cfc7f04a in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=<optimized out>, func=<optimized out>) at Python/ceval.c:4447
#7  call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4372
#8  PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2989
#9  0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21cb5cab30, globals=0x80, locals=0x0, args=0x1, argcount=0, kws=0x6a, kwcount=0, defs=0x7f21cb5d6e28, defcount=1, closure=0x0)
    at Python/ceval.c:3584
#10 0x00007f21cfc7f04a in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=<optimized out>, func=<optimized out>) at Python/ceval.c:4447
#11 call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4372
#12 PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2989
#13 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21c8d79bb0, globals=0x80, locals=0x0, args=0x7f21c8cdd388, argcount=0, kws=0x6a, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3584
#14 0x00007f21cfc7f04a in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=<optimized out>, func=<optimized out>) at Python/ceval.c:4447
---Type <return> to continue, or q <return> to quit---
#15 call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4372
#16 PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2989
#17 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21c8d79630, globals=0x80, locals=0x0, args=0x7f21c8cced68, argcount=0, kws=0x6a, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3584
#18 0x00007f21cfc7f04a in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=<optimized out>, func=<optimized out>) at Python/ceval.c:4447
#19 call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4372
#20 PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2989
#21 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21c8d795b0, globals=0x80, locals=locals@entry=0x0, args=0x4, argcount=0, kws=0x6a, kws@entry=0x7f21c8d398d8, kwcount=4,
    defs=0x7f21c8cfca28, defcount=1, closure=0x0) at Python/ceval.c:3584
#22 0x00007f21cfbf9845 in function_call (func=0x7f21c8d76cf8, arg=0x7f21c9515410, kw=0x7f21c8cd95c8) at Objects/funcobject.c:523
#23 0x00007f21cfbc7863 in PyObject_Call (func=0x7f21c8d76cf8, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2547
#24 0x00007f21cfbd646d in instancemethod_call (func=0x1004290, arg=0x7f21d0169050, kw=0x7f21c8cd95c8) at Objects/classobject.c:2602
#25 0x00007f21cfbc7863 in PyObject_Call (func=func@entry=0x7f21ca12e050, arg=arg@entry=0x7f21d0169050, kw=kw@entry=0x7f21c8cd95c8) at Objects/abstract.c:2547
#26 0x00007f21cfc355ed in slot_tp_init (self=<optimized out>, args=0x7f21d0169050, kwds=0x7f21c8cd95c8) at Objects/typeobject.c:5806
#27 0x00007f21cfc31e1f in type_call (type=<optimized out>, args=0x7f21d0169050, kwds=0x7f21c8cd95c8) at Objects/typeobject.c:765
#28 0x00007f21cfbc7863 in PyObject_Call (func=0xfffcc0, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2547
#29 0x00007f21cfc7ee79 in do_call (nk=<optimized out>, na=<optimized out>, pp_stack=<optimized out>, func=<optimized out>) at Python/ceval.c:4569
#30 call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4374
---Type <return> to continue, or q <return> to quit---
#31 PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2989
#32 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21c9514d30, globals=0x80, locals=0x0, args=0x2, argcount=0, kws=0x6a, kwcount=1, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3584
#33 0x00007f21cfc7f04a in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=<optimized out>, func=<optimized out>) at Python/ceval.c:4447
#34 call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4372
#35 PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2989
#36 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21c8d790b0, globals=0x80, locals=0x0, args=0x1, argcount=0, kws=0x6a, kwcount=0, defs=0x7f21c8d73d68, defcount=1, closure=0x0)
    at Python/ceval.c:3584
#37 0x00007f21cfc7f04a in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=<optimized out>, func=<optimized out>) at Python/ceval.c:4447
#38 call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4372
#39 PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2989
#40 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21c9516130, globals=0x80, globals@entry=0x7f21c9504d70, locals=0x0, locals@entry=0x7f21c9504d70, args=0x0,
    argcount=argcount@entry=0, kws=0x6a, kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#41 0x00007f21cfc83329 in PyEval_EvalCode (co=co@entry=0x7f21c9516130, globals=globals@entry=0x7f21c9504d70, locals=locals@entry=0x7f21c9504d70) at Python/ceval.c:669
#42 0x00007f21cfc970b9 in PyImport_ExecCodeModuleEx (name=0xfb3ff0 "main", co=0x7f21c9516130, pathname=<optimized out>) at Python/import.c:731
#43 0x00007f21cfc9740e in load_source_module (name=0xfb3ff0 "main", pathname=0xfb6010 "/pythia-stateless/main.pyc", fp=0x7f21cff93aac <Py_VerboseFlag>) at Python/import.c:1121
#44 0x00007f21cfc981f9 in import_submodule (mod=0x7f21cff48b60 <_Py_NoneStruct>, subname=0xfb3ff0 "main", fullname=0xfb3ff0 "main") at Python/import.c:2725
#45 0x00007f21cfc98ea2 in load_next (p_buflen=<synthetic pointer>, buf=0xfb3ff0 "main", p_name=<synthetic pointer>, altmod=0x7f21cff48b60 <_Py_NoneStruct>,
---Type <return> to continue, or q <return> to quit---
    mod=0x7f21cff48b60 <_Py_NoneStruct>) at Python/import.c:2539
#46 import_module_level (locals=<optimized out>, level=<optimized out>, fromlist=0x7f21cff48b60 <_Py_NoneStruct>, globals=<optimized out>, name=<optimized out>)
    at Python/import.c:2247
#47 PyImport_ImportModuleLevel (name=0x7f21d014a654 "main", globals=<optimized out>, locals=<optimized out>, fromlist=0x7f21cff48b60 <_Py_NoneStruct>, level=<optimized out>)
    at Python/import.c:2312
#48 0x00007f21cfc77488 in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
#49 0x00007f21cfbc7863 in PyObject_Call (func=func@entry=0x7f21d019cfc8, arg=arg@entry=0x7f21c950f5d0, kw=<optimized out>) at Objects/abstract.c:2547
#50 0x00007f21cfc78f87 in PyEval_CallObjectWithKeywords (func=0x7f21d019cfc8, arg=0x7f21c950f5d0, kw=<optimized out>) at Python/ceval.c:4221
#51 0x00007f21cfc7dc2c in PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2624
#52 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21c9514c30, globals=0x80, globals@entry=0x7f21c9504b40, locals=0x0, locals@entry=0x7f21c9504b40, args=0x0,
    argcount=argcount@entry=0, kws=0x6a, kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#53 0x00007f21cfc83329 in PyEval_EvalCode (co=co@entry=0x7f21c9514c30, globals=globals@entry=0x7f21c9504b40, locals=locals@entry=0x7f21c9504b40) at Python/ceval.c:669
#54 0x00007f21cfc970b9 in PyImport_ExecCodeModuleEx (name=0xc9dd80 "utils", co=0x7f21c9514c30, pathname=<optimized out>) at Python/import.c:731
#55 0x00007f21cfc9740e in load_source_module (name=0xc9dd80 "utils", pathname=0xcb2e40 "/pythia-stateless/utils.pyc", fp=0x7f21cff93aac <Py_VerboseFlag>) at Python/import.c:1121
#56 0x00007f21cfc981f9 in import_submodule (mod=0x7f21cff48b60 <_Py_NoneStruct>, subname=0xc9dd80 "utils", fullname=0xc9dd80 "utils") at Python/import.c:2725
#57 0x00007f21cfc98ea2 in load_next (p_buflen=<synthetic pointer>, buf=0xc9dd80 "utils", p_name=<synthetic pointer>, altmod=0x7f21cff48b60 <_Py_NoneStruct>,
    mod=0x7f21cff48b60 <_Py_NoneStruct>) at Python/import.c:2539
#58 import_module_level (locals=<optimized out>, level=<optimized out>, fromlist=0x7f21cff48b60 <_Py_NoneStruct>, globals=<optimized out>, name=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at Python/import.c:2247
#59 PyImport_ImportModuleLevel (name=0x7f21cdbffa44 "utils", globals=<optimized out>, locals=<optimized out>, fromlist=0x7f21cff48b60 <_Py_NoneStruct>, level=<optimized out>)
    at Python/import.c:2312
#60 0x00007f21cfc77488 in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
#61 0x00007f21cfbc7863 in PyObject_Call (func=func@entry=0x7f21d019cfc8, arg=arg@entry=0x7f21c950f470, kw=<optimized out>) at Objects/abstract.c:2547
#62 0x00007f21cfc78f87 in PyEval_CallObjectWithKeywords (func=0x7f21d019cfc8, arg=0x7f21c950f470, kw=<optimized out>) at Python/ceval.c:4221
#63 0x00007f21cfc7dc2c in PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2624
#64 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21c9514a30, globals=0x80, globals@entry=0x7f21c9504a28, locals=0x0, locals@entry=0x7f21c9504a28, args=0x0,
    argcount=argcount@entry=0, kws=0x6a, kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#65 0x00007f21cfc83329 in PyEval_EvalCode (co=co@entry=0x7f21c9514a30, globals=globals@entry=0x7f21c9504a28, locals=locals@entry=0x7f21c9504a28) at Python/ceval.c:669
#66 0x00007f21cfc970b9 in PyImport_ExecCodeModuleEx (name=0xb19630 "models", co=0x7f21c9514a30, pathname=<optimized out>) at Python/import.c:731
#67 0x00007f21cfc9740e in load_source_module (name=0xb19630 "models", pathname=0xbadb70 "/pythia-stateless/models.pyc", fp=0x7f21cff93aac <Py_VerboseFlag>) at Python/import.c:1121
#68 0x00007f21cfc981f9 in import_submodule (mod=0x7f21cff48b60 <_Py_NoneStruct>, subname=0xb19630 "models", fullname=0xb19630 "models") at Python/import.c:2725
#69 0x00007f21cfc98ea2 in load_next (p_buflen=<synthetic pointer>, buf=0xb19630 "models", p_name=<synthetic pointer>, altmod=0x7f21cff48b60 <_Py_NoneStruct>,
    mod=0x7f21cff48b60 <_Py_NoneStruct>) at Python/import.c:2539
#70 import_module_level (locals=<optimized out>, level=<optimized out>, fromlist=0x7f21cff48b60 <_Py_NoneStruct>, globals=<optimized out>, name=<optimized out>)
    at Python/import.c:2247
#71 PyImport_ImportModuleLevel (name=0x7f21c94febf4 "models", globals=<optimized out>, locals=<optimized out>, fromlist=0x7f21cff48b60 <_Py_NoneStruct>, level=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at Python/import.c:2312
#72 0x00007f21cfc77488 in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
#73 0x00007f21cfbc7863 in PyObject_Call (func=func@entry=0x7f21d019cfc8, arg=arg@entry=0x7f21c950f050, kw=<optimized out>) at Objects/abstract.c:2547
#74 0x00007f21cfc78f87 in PyEval_CallObjectWithKeywords (func=0x7f21d019cfc8, arg=0x7f21c950f050, kw=<optimized out>) at Python/ceval.c:4221
#75 0x00007f21cfc7dc2c in PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2624
#76 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21c950c930, globals=0x80, globals@entry=0x7f21c94fdd70, locals=0x0, locals@entry=0x7f21c94fdd70, args=0x0,
    argcount=argcount@entry=0, kws=0x6a, kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3584
#77 0x00007f21cfc83329 in PyEval_EvalCode (co=co@entry=0x7f21c950c930, globals=globals@entry=0x7f21c94fdd70, locals=locals@entry=0x7f21c94fdd70) at Python/ceval.c:669
#78 0x00007f21cfc970b9 in PyImport_ExecCodeModuleEx (name=0xc95430 "document_search", co=0x7f21c950c930, pathname=<optimized out>) at Python/import.c:731
#79 0x00007f21cfc9740e in load_source_module (name=0xc95430 "document_search", pathname=0x8de600 "/pythia-stateless/document_search.pyc", fp=0x7f21cff93aac <Py_VerboseFlag>)
    at Python/import.c:1121
#80 0x00007f21cfc981f9 in import_submodule (mod=0x7f21cff48b60 <_Py_NoneStruct>, subname=0xc95430 "document_search", fullname=0xc95430 "document_search") at Python/import.c:2725
#81 0x00007f21cfc98ea2 in load_next (p_buflen=<synthetic pointer>, buf=0xc95430 "document_search", p_name=<synthetic pointer>, altmod=0x7f21cff48b60 <_Py_NoneStruct>,
    mod=0x7f21cff48b60 <_Py_NoneStruct>) at Python/import.c:2539
#82 import_module_level (locals=<optimized out>, level=<optimized out>, fromlist=0x7f21cff48b60 <_Py_NoneStruct>, globals=<optimized out>, name=<optimized out>)
    at Python/import.c:2247
#83 PyImport_ImportModuleLevel (name=0x7f21d00644b4 "document_search", globals=<optimized out>, locals=<optimized out>, fromlist=0x7f21cff48b60 <_Py_NoneStruct>,
    level=<optimized out>) at Python/import.c:2312
---Type <return> to continue, or q <return> to quit---
#84 0x00007f21cfc77488 in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
#85 0x00007f21cfbc7863 in PyObject_Call (func=func@entry=0x7f21d019cfc8, arg=arg@entry=0x7f21d0052998, kw=<optimized out>) at Objects/abstract.c:2547
#86 0x00007f21cfc78f87 in PyEval_CallObjectWithKeywords (func=0x7f21d019cfc8, arg=0x7f21d0052998, kw=<optimized out>) at Python/ceval.c:4221
#87 0x00007f21cfc7dc2c in PyEval_EvalFrameEx (f=0x1004290, throwflag=128) at Python/ceval.c:2624
#88 0x00007f21cfc83200 in PyEval_EvalCodeEx (co=0x7f21d004c830, globals=0x80, locals=0x0, args=0x0, argcount=argcount@entry=0, kws=0x6a, kws@entry=0x0, kwcount=0, defs=0x0,
    defcount=0, closure=0x0) at Python/ceval.c:3584
#89 0x00007f21cfc83329 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at Python/ceval.c:669
#90 0x00007f21cfca7a4a in run_mod (arena=<optimized out>, flags=<optimized out>, locals=<optimized out>, globals=<optimized out>, filename=<optimized out>, mod=<optimized out>)
    at Python/pythonrun.c:1376
#91 PyRun_FileExFlags (fp=0x83bbf0, filename=0x7f21d004c830 "\002", start=0, globals=0x7f21d013e168, locals=0x7f21d013e168, closeit=1, flags=0x7fff02fc6530)
    at Python/pythonrun.c:1362
#92 0x00007f21cfca8fd7 in PyRun_SimpleFileExFlags (fp=0x1004290, fp@entry=0x83bbf0, filename=0x7fff02fc7ddc "main.py", closeit=1, flags=0x7f21cf964010 <sem_wait+48>,
    flags@entry=0x7fff02fc6530) at Python/pythonrun.c:948
#93 0x00007f21cfca94f3 in PyRun_AnyFileExFlags (fp=fp@entry=0x83bbf0, filename=<optimized out>, closeit=<optimized out>, flags=flags@entry=0x7fff02fc6530) at Python/pythonrun.c:752
#94 0x00007f21cfcbef95 in Py_Main (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:640
#95 0x00007f21ceec4b45 in __libc_start_main (main=0x4006c0 <main>, argc=2, argv=0x7fff02fc66e8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
    stack_end=0x7fff02fc66d8) at libc-start.c:287
#96 0x00000000004006ee in _start ()

I wasn't able to get the Python stack trace unfortunately.

yurishkuro commented 7 years ago

can you share a minimum example to reproduce?

trtg commented 7 years ago

I'm encountering this same issue when trying to integrate an application running under tornado in a docker container but as a WSGIApplication (so no async allowed) . I also wasn't able to get a stack trace, but the issue appears to be in local_agent_net.py::_create_new_thread_loop() Seems to be that things lock up as soon as self._thread_loop.is_ready() is called

yurishkuro commented 7 years ago

tornado is used by jaeger_client internally. We have it working with WSGI services just fine, but you need to make sure the tracer is initialized post-fork.

I have seen the locking issue before due to the way our of our internal framework is structured, namely that it performs initialization as part of an import. Import take an exclusive lock on the interpreter and if during the execution another thread is created that does more imports, you can get into a deadlock.

Do you have a reproducable example?

trtg commented 7 years ago

It's completely reproducible in the context of our full production system. I tried to distill it down to a trivial example that is ordered in the same way using a tornado WSGIApplication, but in that example the deadlock doesn't happen. As far as in can tell, the structure of my toy example is essentially the same as our overall production code, so I'm not clear why it doesn't have the same issue. How would I ensure that I'm initializing the tracer post-fork? I'm going to try messing around with uwsgi's @postfork decorator.

yurishkuro commented 7 years ago

Yes, in our internal lib that wraps WSGI we're using @postfork decorator to init the tracer and middleware.

bitglue commented 7 years ago

https://docs.python.org/2/library/threading.html#importing-in-threaded-code

Firstly, other than in the main module, an import should not have the side effect of spawning a new thread and then waiting for that thread in any way. Failing to abide by this restriction can lead to a deadlock if the spawned thread directly or indirectly attempts to import a module.

The trouble is in a lot of cases, jaeger_client will be initialized as part of importing a module. And then jaeger_client starts a thread and waits for it (indirectly: threadloop is directly responsible), deadlocking as described above. For example django-opentracing suggests configuring through Django's settings.py, which is imported by whatever WSGI server is being used.

Configuring jaeger from __main__ works fine.

$ cat >foo.py
from jaeger_client import Config

config = Config(config={'logging': True}, service_name='postal-main')
tracer = config.initialize_tracer()
$ python foo.py
<jaeger_client.tracer.Tracer object at 0x7f0996b1e490>

But try to import that module, and you get a deadlock:

$ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import foo
[deadlock]
yurishkuro commented 7 years ago

@bitglue I added a section to the README, does it help?

bitglue commented 7 years ago

I think there may be two problems here.

One, starting jaeger and then forking leads to problems. I'd guess it's because the children will be running the same event loop, on the same file descriptors.

The other problem is this deadlock. As my case above illustrates, it's not necessary to fork anything to get this deadlock: all that's required is to import something which tries to initialize jaeger.

With django in particular this is problematic: opentracing-contrib/python-django configures middleware (distinct from WSGI middleware) in a settings file: I don't believe there's a way to add middleware post-fork. The settings file is a module that gets imported. The middleware takes a tracer as a parameter. Obtaining a tracer implicitly starts jager_client's event loop. Thus it's not really possible to configure the django middleware to use jager without deadlocking.

lapointexavier commented 7 years ago

I'm in the same situation as @bitglue describes with Django / gunicorn. I guess because of django works, using gunicorn's post_fork might not be an option. Is there a workaround for django + gunicorn, or it's just not compatible?

bitglue commented 7 years ago

I was able to work around the issue by extending the opentracing-contrib/python-django to handle the case of not yet having a tracer:

class LazyOpenTracingMiddleware(OpenTracingMiddleware):
    '''Opentracing middleware which evaluates the tracer lazily.

    In part due to
    https://github.com/uber/jaeger-client-python/issues/60, and further because
    it's just a good idea to avoid issues like unintentionally shared file
    descriptors between processes, we need to initialize the jaeger tracer
    after the gunicorn fork. Since OpenTracingMiddleware is configured in
    MIDDLEWARE_CLASSES in sites/postmates/settings/base.py, which is imported
    pre-fork, this presents a problem.

    This class modifies OpenTracingMiddleware such that it will do nothing
    until a global tracer exists at `opentracing.tracer`.
    '''

    def __init__(self):
        '''Do nothing.

        OpenTracingMiddleware.__init__ only sets self._tracer, which we don't
        want to do.
        '''

    @property
    def _tracer(self):
        '''Get a DjangoTracer, or None if opentracing not initialized.

        Once opentracing is initialized (indicated by the presence of
        `opentracing.tracer`), will return the same instance of DjangoTracer in
        perpetuity.
        '''
        try:
            return self._django_tracer
        except AttributeError:
            pass

        try:
            tracer = opentracing.tracer
        except AttributeError:
            return None

        self._django_tracer = DjangoTracer(tracer)
        return self._tracer

    def process_view(self, *a, **kw):
        if self._tracer is None:
            return
        OpenTracingMiddleware.process_view(self, *a, **kw)

    def process_response(self, request, response):
        if self._tracer is None:
            return response
        return OpenTracingMiddleware.process_response(self, request, response)

And then I deferred the initialization of the tracer to gunicorn's post_fork hook:

def post_fork(server, worker):
    config = jaeger_client.Config(
        config={
            'sampler': {'type': 'const', 'param': 1},
            'logging': True,
        },
        service_name='postal-main')
    config.initialize_tracer() # sets opentracing.tracer global

Pretty hairy, but gets the job done.

lapointexavier commented 7 years ago

@bitglue Awesome, thanks for that!

yurishkuro commented 7 years ago

@bitglue what is your suggestion now to make it easier for the users? Should this lazy class be made a part of opentracing-contrib/python-django? If so, do you want to add a PR?

bitglue commented 7 years ago

If I could get a tracer object but not start the event loop, that would solve it elegantly. Then the event loop can be controlled with tracer.start_event_loop() and tracer.stop_event_loop() or such.

For backwards compatibility perhaps initialize_tracer(start_event_loop=False) would suffice to request the new behavior?

I'm not a fan of my LazyOpenTracingMiddleware solution since it relies on communicating through the opentracing.tracer global.

yurishkuro commented 7 years ago

I'm not a fan of my LazyOpenTracingMiddleware solution since it relies on communicating through the opentracing.tracer global.

that's what we ended up doing internally as well. Has anyone taken a look at lightstep python client? why didn't it have a problem with django? I assume it also uses a bg thread for reporting.

bitglue commented 7 years ago

On a very cursory read, it doesn't appear to wait for the thread to become ready after starting it. That would explain why it does not deadlock.

Not sure about the consequences of forking after starting the thread. It's possible if it's using blocking IO instead of epoll(), and some care is taken with the file descriptors to not have two processes on the same TCP connection (or there are no TCP connections?), it works out fine.

zachgoldstein commented 7 years ago

👍 I'm bumping into a similar deadlock working with flask, and looking into post-forking options. Thanks for writing this issue, it's been a big help.

yurishkuro commented 7 years ago

31 should address the deadlocking issue. Meanwhile I'll update the README