rs-station / laue-dials

A package for analyzing Laue x-ray crystallography data using the DIALS framework.
https://rs-station.github.io/laue-dials/
MIT License
4 stars 3 forks source link

issue with multiprocessing during spot finding #31

Closed DHekstra closed 9 months ago

DHekstra commented 10 months ago

In /n/hekstra_lab/projects/laue-dials-tests/chess_hewl/tutorial_CHESS.ipynb, this succeeds without the spotfinder.mp.nproc=4 but fails with it (or nproc)

laue.find_spots imported.expt \
    spotfinder.threshold.dispersion.gain=0.7 
#    spotfinder.mp.nproc=4 \

with the following error message (before the traceback):

easy_mp crash detected; subprocess trace: ----
Stacktrace:
exit code = -9

and traceback:

Traceback (most recent call last):
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/bin/laue.find_spots", line 8, in <module>
    sys.exit(run())
             ^^^^^
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/laue_dials/command_line/find_spots.py", line 133, in run
    strong_refls = find_spots(params, imported_expts)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/laue_dials/algorithms/monochromatic.py", line 22, in find_spots
    refls = do_spotfinding(expts, params)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/dials/command_line/find_spots.py", line 127, in do_spotfinding
    reflections = flex.reflection_table.from_observations(experiments, params)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/dials/array_family/flex_ext.py", line 183, in from_observations
    return spotfinder.find_spots(experiments)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/dials/algorithms/spot_finding/finder.py", line 612, in find_spots
    table, hot_mask = self._find_spots_in_imageset(imageset)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/dials/algorithms/spot_finding/finder.py", line 724, in _find_spots_in_imageset
    r, h = extract_spots(imageset[j0:j1])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/dials/algorithms/spot_finding/finder.py", line 414, in __call__
    return self._find_spots(imageset)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/dials/algorithms/spot_finding/finder.py", line 503, in _find_spots
    batch_multi_node_parallel_map(
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/dials/util/mp.py", line 194, in batch_multi_node_parallel_map
    return multi_node_parallel_map(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/dials/util/mp.py", line 165, in multi_node_parallel_map
    result = libtbx.easy_mp.parallel_map(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/libtbx/easy_mp.py", line 637, in parallel_map
    result = res()
             ^^^^^
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/libtbx/scheduling/result.py", line 119, in __call__
    self.traceback( exception = self.exception() )
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/libtbx/scheduling/stacktrace.py", line 117, in __call__
    self.raise_handler( exception = exception )
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/libtbx/scheduling/stacktrace.py", line 134, in raise_with_traceback
    raise_(type(exception), exception, self.traceback)
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/six.py", line 719, in reraise
    raise value
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/libtbx/scheduling/mainthread.py", line 100, in poll
    value = target( *args, **kwargs )
            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/dials/util/mp.py", line 88, in __call__
    return libtbx.easy_mp.parallel_map(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/libtbx/easy_mp.py", line 637, in parallel_map
    result = res()
             ^^^^^
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/libtbx/scheduling/result.py", line 119, in __call__
    self.traceback( exception = self.exception() )
  File "/n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/libtbx/scheduling/stacktrace.py", line 88, in __call__
    raise exception
RuntimeError: Please report this error to dials-support@lists.sourceforge.net: exit code = -9

---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
File <timed eval>:1

File /n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/IPython/core/interactiveshell.py:2493, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
   2491 with self.builtin_trap:
   2492     args = (magic_arg_s, cell)
-> 2493     result = fn(*args, **kwargs)
   2495 # The code below prevents the output from being displayed
   2496 # when using magics with decorator @output_can_be_silenced
   2497 # when the last Python token in the expression is a ';'.
   2498 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False):

File /n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/IPython/core/magics/script.py:154, in ScriptMagics._make_script_magic.<locals>.named_script_magic(line, cell)
    152 else:
    153     line = script
--> 154 return self.shebang(line, cell)

File /n/hekstra_lab/people/dhekstra/conda_envs/laue-dials/lib/python3.11/site-packages/IPython/core/magics/script.py:314, in ScriptMagics.shebang(self, line, cell)
    309 if args.raise_error and p.returncode != 0:
    310     # If we get here and p.returncode is still None, we must have
    311     # killed it but not yet seen its return code. We don't wait for it,
    312     # in case it's stuck in uninterruptible sleep. -9 = SIGKILL
    313     rc = p.returncode or -9
--> 314     raise CalledProcessError(rc, cell)

CalledProcessError: Command 'b'\nlaue.find_spots imported.expt \\\n   spotfinder.threshold.dispersion.gain=0.7 \\\n   spotfinder.mp.nproc=4 \n\n# spotfinder.filter.max_separation=10 \n# spotfinder.threshold.dispersion.sigma_strong=3 \\\n'' returned non-zero exit status 1.
PrinceWalnut commented 9 months ago

@DHekstra I just copied this notebook and tried it with and without multiprocessing. Both versions work on my end, so I can't reproduce this issue. Is it still occurring for you? If not, it may have been a DIALS-side issue fixed between versions.

DHekstra commented 9 months ago

OK, that seems to work now.