ojwoodford / batch_job

Parallelize MATLAB for loops across workers, without the Parallel Computing Toolbox
MIT License
18 stars 6 forks source link

timeout not working (non deterministic)? #14

Open spotlightgit opened 4 years ago

spotlightgit commented 4 years ago

Hey Oliver,

with this kind of function call I have some trouble with your toolbox: batch_job_distrib(goal_function, x, worker, additional_data, '-chunk_lims', [1 1], '-timeout', timeout); In the past there were no issues without using the timeout option. Now I want to exclude the Master Matlab from number crunching, therefore I use the timeout option (like you suggested). Unfortunately it happens sometimes that all workers are closed/finished but a single mat-file still has a file lock (for example: chunk000001.mat.lock) and the Master Matlab waits endless and no timeout is applied. If I "finish" this situation with Ctrl + c at the command line of the Master Matlab the following error appears:

Please wait while the workers are halted.
Operation terminated by user during batch_job_collect (line 78)

In batch_job_distrib (line 201)
    output = batch_job_collect(s, co);
...

Do you have any idea what's going wrong or suggestions what I can do?

ojwoodford commented 4 years ago

Thanks. This sounds like a bug. I just pushed a change which seems to work for me. Please test and let me know.