raacampbell / matlab_elastix

MATLAB Elastix wrapper
http://www.mathworks.com/matlabcentral/fileexchange/52982-matlab-elastix
GNU Lesser General Public License v3.0
26 stars 12 forks source link

Warning when running my MATLAB code in Linux using Elastix: A worker aborted during execution of the parfor loop #25

Open zw2001 opened 4 years ago

zw2001 commented 4 years ago

When I ran my matlab code in Linux environment on HPC, I ran into following warning signs:

Warning: A worker aborted during execution of the parfor loop. The parfor loop will now run again on the remaining workers.

Now my questions are:

  1. Will the job being processed on the aborted worker be neglected, without processed further?
  2. Since the above message shows that parfor will run again on the remaining workers, and in my code one worker corresponds to one patient image, so, only part of patient images, say, 6 out of 10 patient images will be processed? The reason for me to ask this question is that I do see only part of patient image being processed.

My matlab code works in Windows environment perfectly without above issues at all.

raacampbell commented 4 years ago

This is a tough one. I hesitantly suggest the issue is likely the HPC. I work pretty much exclusively on Linux and haven't noticed a problem but I've almost never tried this on an HPC.

  1. Will the job being processed on the aborted worker be neglected, without processed further?

Yes, probably.

  1. Since the above message shows that parfor will run again on the remaining workers, and in my code one worker corresponds to one patient image, so, only part of patient images, say, 6 out of 10 patient images will be processed? The reason for me to ask this question is that I do see only part of patient image being processed.

The workers are parallel and independent so that makes sense.

I'd suggest the following:

  1. Identify which image fails and confirm what happens on Windows.
  2. What happens on Linux without HPC? Just a regular desktop.
  3. Where exactly does it fail? How does it fail? Look at the log files and if necessary make more logging to check.

Rob