Closed spotlightgit closed 4 years ago
There are several concurrency issues that I haven't been able to fix. But this could be a bug that is fixable. Do you have a test script which can reliably reproduce the error? Are your workers on the same machine as the master, or a different machine?
Right now I have my workers on a single machine (multiple machines in the next days/weeks)
Here is a test script, which reproduces the error:
x = rand(3,10);
out = batch_job_distrib(@Rastrigin, x, {'',2}, '-chunk_lims', [1 1]);
and the corresponding goal function:
function [y] = Rastrigin(x)
n = length(x);
s = 0;
for j = 1:n
s = s+(x(j)^2-10*cos(2*pi*x(j)));
end
y = 10*n+s;
Many thanks for the script. That helped a lot.
The issue was that the master worker finished the job before any workers could start. This shouldn't record an error.
I have made and pushed a fix.
Before I apply the parallelization on my lenghty goal functions, I want to check my optimization algorithm based on some easy to evaluate test functions. If I do so, I always get an error file after each call of batch_job_distrib with this content:
Error using load
Unable to read file 'D:/.../tp12e45761_47d1_4658_9350_ecd6c93a7c37.mat'. No such file or directory.
I assume the reason is, that one worker is faster than the other one and evaluates the last job while an other worker also want to evaluate the same, but is too late. I further assume that this is a warning which could be ignored cause everything else works fine. Is there any possibility to avoid the creation of this error file? If the evaluation durations of the goal functions are getting longer, this issue is getting less important, but from my understanding in theorey this could happen "every time". What would be your suggestion?