Open spotlightgit opened 4 years ago
maybe it works if following changes are done: start_workers.m:
% Copy the command file
if ispc
[status, cmdout] = system(sprintf('scp %s %s:./batch_job_distrib_cmd.bat', cmd_file, workers{w,1}));
else
[status, cmdout] = system(sprintf('cat %s | ssh %s "cat - > ./batch_job_distrib_cmd.bat"', cmd_file, workers{w,1}));
end
and
% Make it executable
if ~ispc
[status, cmdout] = system(sprintf('ssh %s "chmod u+x batch_job_distrib_cmd.bat"', workers{w,1}));
assert(status == 0, cmdout);
end
and
% Add on the ssh command
if ispc
cmd = sprintf('ssh %s batch_job_distrib_cmd.bat', workers{w,1});
else
cmd = sprintf('ssh %s ./batch_job_distrib_cmd.bat', workers{w,1});
end
Batch_job_distrib.m
% Remove the command file
try
if ispc
[status, cmdout] = system(sprintf('ssh %s "del batch_job_distrib_cmd.bat"', workers{w,1}));
else
[status, cmdout] = system(sprintf('ssh %s "rm -f ./batch_job_distrib_cmd.bat"', workers{w,1}));
end
assert(status == 0, cmdout);
catch me
Many thanks for the input here. One issue I see is that ispc()
tells you wether the master is a PC, but not the worker. The master could be a PC and the worker could be running linux.
Well, I understand your worries. In my case master and worker are both Windows systems. I think this could be solved by adding a new input argument, where the user can select which operating system the worker has. Maybe also an extension of the "workers" option -> "hostname, number of worker, system". Than it is possible to use different operating systems as workers. If not further specified the system of the master could be used as default for the workers. An automatic detection would be the best solution :-) But maybe too much effort for this kind of issue ...
Hey Oliver, have you decided already how you want to continue with this issue? :-)
No. It's a bit tricky. Ideally I'd like to get rid of the file, and just send the command over ssh. But not yet sure how to do this in a platform agnostic way. I'm open to input.
Well, as you mentioned it should be possible to send the commands directly instead of starting a batch file. I am working with MATLAB at Windows, therfore I have no experience regarding the differences between both operating systems for running MATLAB.
Hey Oliver, with my fast running test function I was thinking distributed computing is working at Windows (with my adoptions above), but it is not. It seems that SSH at Windows and Linux have different behaviour. Apparently all applications which are started within the SSH session are closed after closing of the SSH session. In your actual implementation the execution of the batch file is a single command which opens a SSH connection and close it immediately afterwards. Therefore the started MATLAB is also closed immediately. It looks like I missed this behaviour at my initial testing with a fast goal function. Probably the master Matlab was doing the number crunching instead of the workers and I got no errors. Now with my slow goal function I realized this issue. Have you ever used your toolbox at Windows? Do you have any suggestions how this issue could be solved?
I have used it on Windows, but several years ago. The process needs to be disconnected from the shell. I believe the way to do this is using the start command: https://superuser.com/questions/1069972/windows-run-process-on-background-after-closing-cmd/1069983 However, I'm doing this, and you say it doesn't work. It needs further investigation. Unfortunately I don't have time to do this at present.
Hello Oliver,
my SSH Connection is working between both PCs, that means I can ssh without entering a Password, which is one enabler for distributed computing with your wonderful Toolbox. Unfortunately the "cat" command is not known to the Windows command line, which is called by the "system" command from MATLAB. Possible workarounds: 1.) "cat" is known at Windows PowerShell -> Seems interesting to call PowerShell instead of command line, but !powershell cat … or !powershell -inputformat none cat ... are both not working on my Matlab (don't know why). 2.) replace "cat" with "type" -> It seems to have the same functionality like "cat" on Linux Systems. Type is working at command line and PowerShell.
To be compatible to Linux and Windows it could be possible to check with "ispc" and than execute system(sprintf('type … or system(sprintf('cat … Instead of usage of "ispc" is also an additional Option possible for "batch_job_distrib()"
What do you think?