ratt-ru / tricolour

Holds an offline, MS direct version of the SDP online flagger.
Other
8 stars 4 forks source link

runs out of threads after a long run #16

Closed bennahugo closed 5 years ago

bennahugo commented 5 years ago

@sjperkins looks like resources aren't being released. Managed to crash all jobs on the machine after running out of pthreads after 31 hours:

[########################################] | 100% Completed | 41min 36.1s
tricolour - 2019-04-20 03:23:59,182 INFO - Data flagged successfully in 31:10:31 hours
tricolour - 2019-04-20 03:23:59,204 INFO - Adding field 'G331.59-0.36' to compute graph for processing
Unexpected error. Dropping you into pdb for a post-mortem.
Traceback (most recent call last):
  File "/home/hugo/venv/bin/tricolour", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/home/hugo/tricolour/tricolour/scripts/tricolour", line 5, in <module>
    tricolour.main()
  File "/home/hugo/tricolour/tricolour/__init__.py", line 434, in main
    pool = ThreadPool(args.nworkers)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 727, in __init__
    Pool.__init__(self, processes, initializer, initargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 159, in __init__
    self._repopulate_pool()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 223, in _repopulate_pool
    w.start()
  File "/usr/lib/python2.7/multiprocessing/dummy/__init__.py", line 75, in start
    threading.Thread.start(self)
  File "/usr/lib/python2.7/threading.py", line 736, in start
    _start_new_thread(self.__bootstrap, ())
error: can't start new thread

> /usr/lib/python2.7/threading.py(736)start()
-> _start_new_thread(self.__bootstrap, ())
bennahugo commented 5 years ago

Not sure if you know how to properly release resources after each compute graph has completed?

bennahugo commented 5 years ago

I've moved

pool = ThreadPool(args.nworkers)

out of the loop. I'm presuming this is where things goes horribly wrong...

sjperkins commented 5 years ago

Hmmmm should've caught this in the #12 review, before this the ThreadPool was initialised outside the loop.

sjperkins commented 5 years ago

You probably want to do the same with the profilers