mixpanel / mixpanel-utils

Other
87 stars 63 forks source link

Request retries don't properly return response which causes api call to hang #17

Closed rjobanp closed 3 years ago

rjobanp commented 5 years ago

Hi there,

We use this package to import events into mixpanel and noticed one of our processes was hung for several days on a call into this library. Upon further inspection it looks like a bug with the handling of retries.

We were calling the import_events api which uses a ThreadPool to batch requests to mixpanel https://github.com/mixpanel/mixpanel_api/blob/e6ff0b21229518b820f2ef11fc18f3b9d0dd7fb2/__init__.py#L1652 One of those requests hit a 503 Service Unavailable response which seemed to trigger a couple bugs that then caused our process to hang:

The retry calls in request() do not return the response from the retried self.request() call so the original request() call just returns a None up the callstack: e.g. https://github.com/mixpanel/mixpanel_api/blob/e6ff0b21229518b820f2ef11fc18f3b9d0dd7fb2/__init__.py#L224 & lines 232, 240, 257 They should probably be return self.request( ...

The calls to add requests to the threadpool use a callback given to pool.apply_async https://github.com/mixpanel/mixpanel_api/blob/e6ff0b21229518b820f2ef11fc18f3b9d0dd7fb2/__init__.py#L1683

which expects the main argument to be a json string https://github.com/mixpanel/mixpanel_api/blob/e6ff0b21229518b820f2ef11fc18f3b9d0dd7fb2/__init__.py#L1262

However since it's a None we get a TypeError exception in that thread.

Now what makes this all worse is an apparent bug in ThreadPool where any error in a callback function will cause the whole process to hang on join. I tested this on python 2.7.12

In [12]: from multiprocessing.pool import ThreadPool

In [13]: p = ThreadPool(5)

In [14]: p
Out[14]: <multiprocessing.pool.ThreadPool at 0x7f9d874719d0>

In [15]: def callback(response):
    ...:     raise TypeError()
    ...: 

In [16]: import random

In [17]: def runner(val):
    ...:     return random.randint(1, val)
    ...: 

In [18]: [p.apply_async(runner, [i], callback=callback) for i in range(3, 12)]
Out[18]: 
[<multiprocessing.pool.ApplyResult at 0x7f9d87418850>,
 <multiprocessing.pool.ApplyResult at 0x7f9d873ba510>,
 <multiprocessing.pool.ApplyResult at 0x7f9d87418990>,
 <multiprocessing.pool.ApplyResult at 0x7f9d87418a90>,
 <multiprocessing.pool.ApplyResult at 0x7f9d87418b10>,
 <multiprocessing.pool.ApplyResult at 0x7f9d874224d0>,
 <multiprocessing.pool.ApplyResult at 0x7f9d87418a10>,
 <multiprocessing.pool.ApplyResult at 0x7f9d87429350>,
 <multiprocessing.pool.ApplyResult at 0x7f9d874293d0>]

In [19]: Exception in thread Thread-807:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 405, in _handle_results
    cache[job]._set(i, obj)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 572, in _set
    self._callback(self._value)
  File "<ipython-input-15-6f8b63075492>", line 2, in callback
    raise TypeError()
TypeError

In [19]: p.close()

In [20]: p.join()

^C
^C

^C

# refuses to die
# killed here from outside

Making sure the _response_handler_callback function catches all exceptions and adding returns to all the self.request calls in request should fix the issue. Happy to make a PR myself if that helps.

Thanks

jaredmixpanel commented 3 years ago

Fixed: https://github.com/mixpanel/mixpanel_api/pull/18