kotowicz / matlab-ParforProgress2

A PARFOR progress monitor for Matlab
13 stars 3 forks source link

create working solution for globals issue #4

Open kotowicz opened 10 years ago

kotowicz commented 10 years ago

Problem statement: https://github.com/kotowicz/matlab-ParforProgress2/pull/2 Possibly related Matlab issues: http://www.mathworks.com/matlabcentral/newsreader/view_thread/163362 http://www.mathworks.com/matlabcentral/newsreader/view_thread/287891 http://mathforum.org/kb/message.jspa?messageID=7445456

I'm happy to receive any feedback from those of you working with globals.

kotowicz commented 10 years ago

detailed problem demo:

>> global a
>> a = 10

a =

    10

>> matlabpool open local
Starting matlabpool using the 'local' profile ... a
connected to 12 workers.

a =

    10

>> a

a =

    10

>> ParforProgressStressTest2

  >> execution time was 1.64s.

ParforProgressStressTest2 running time: 1.6267s.
>> a
Undefined function or variable 'a'.

the global a disappears because JAVAADDPATH clears all globals.

kotowicz commented 10 years ago

The 0.2.5 release https://github.com/kotowicz/matlab-ParforProgress2/releases/tag/v0.2.5 should address all issues. In case there are still problems with global variables, please report them here.

kotowicz commented 10 years ago

demo to show that globals do not get cleared anymore:

>> global a
>> a = 10

a =

    10

>> matlabpool open local
Starting matlabpool using the 'local' profile ... connected to 12 workers.

>> ParforProgressStressTest2(10000, 0)

  >> execution time was 1.36s.

ParforProgressStressTest2 running time: 1.3521s.

>> a

a =

    10

>> matlabpool close
Sending a stop signal to all the workers ... stopped.

>> ParforProgressStressTest2(10000, 0)

  >> execution time was 3.18s.

ParforProgressStressTest2 running time: 3.1779s.

>> a

a =

    10

>> matlabpool open local
Starting matlabpool using the 'local' profile ... connected to 12 workers.

>> ParforProgressStressTest2(10000, 0)

  >> execution time was 1.3s.

ParforProgressStressTest2 running time: 1.2994s.

>> matlabpool close
Sending a stop signal to all the workers ... stopped.

>> a

a =

    10
chinasaur commented 10 years ago

I haven't had a chance to look into the code yet; will try this week. Maybe you can just tell me; was the problem with my approach that if you close and open the matlabpool then it doesn't realize it needs to add the Java path again in the child processes?

If that's the issue, I think the best solution would be to have a runonall that does the same kind of check for whether Java path is already set on all the child processes. Then just run that every time and it will normally do nothing but if it's the first time it will add the java path on all, and if the pool has been restarted it will add java path on just the children but not the master.

chinasaur commented 10 years ago

Actually, perhaps you should consider an additional change that I have made in my own branch. In my version, in addition to trying to avoid the globals issue, I have also eliminated the Client Java class. Instead, the Client class is just a Matlab handle and then the increment method looks like:

function increment(o, i) %#ok<INUSD>
    % i is a fake input so we stay compatible with
    % "ParforProgressConsole2.m"
    s = java.net.Socket();
    s.connect(o.ISA);
    s.close();
end

Of course, it's a bit constraining to make the Client Matlab instead of Java, but since its job is simple I consider this an improvement. Additionally, you don't have the Java path issue on the child processes anymore, since only the Server node actually needs any extra Java path. (This is probably why I didn't catch the matlabpool restart bug when I submitted my change to you; apologies for debugging in an inconsistent context!)

(I believe I went through some iterations trying some fancier things with the java.net.Socket like setting some additional flags, but in the end this simple thing seemed best.)

Let me know if you want a PR with these more aggressive changes.

kotowicz commented 10 years ago

thanks for your feedback. At this moment I would try to avoid making too many changes at the same time. For now I'd like to see whether the current implementation (v0.2.5) does work (in theory) for everybody. Once we have a stable, working solution, I suggest we try to look at different options for the future.

chinasaur commented 10 years ago

That's fine; I'll try to find time to do some tests. Take a look at my fork if you are interested in the simplified approach; I think my fork is up to date.

mtompkins commented 10 years ago

I noticed I'm current with your release and my globals are getting dropped. I use them to init an SMTP send email address. I'll see if I can widdle it down to something you can replicate locally.

That said, unfortunately the problem still exists.

chinasaur commented 10 years ago

@mtompkins, do your globals get stomped repeatedly, or just the first time parforprogress is loaded?

mtompkins commented 10 years ago

@chinasaur I'm not quite sure what you are asking so I'll answer as follows:

I init a single global variable which holds a simple email address.

I then use the bar during a parametric sweep (lasts a few days).
On a failure of the sweep it should send an email

When the script ends an email should be sent notifying a successful completion. My latest sweep ended today (successfully) but there was an error at the email send routine as the global variable had been dropped.

If I test the sections that init the variable and send the email (bypassing the parametric sweep itself) everything works as expected.

chinasaur commented 10 years ago

(Answer below is based on my fork, which is somewhat different, but I believe should apply equally here; kotowicz can correct if I'm mixed up.)

I believe the globals clearing happens when parforprogress adds itself to your javaclasspath. This used to happen every time parforprogress was called, but the new changes should make it only happen the first time it's called, and only if it's not on the classpath already.

You could try initiating something that uses parforprogress, interrupting it early, init your global and then rerun; it shouldn't stomp the global then. Try with something shorter than a few days first :).

If this fixes your problem, then you could arrange to have the appropriate classes added to javaclasspath on startup and then I believe you won't have to think about it anymore.

mtompkins commented 10 years ago

Gotcha, thank you. So init / paint the progress bar briefly then carry on basically until there is a larger rework.

kotowicz commented 10 years ago

@mtompkins do you run your code like this? Please notice that you need to use the run_javaaddpath = 0 setting for it to work with globals.

 >> run_javaaddpath = 0;
 >> s = 'dummy task';
 >> n = 100;
 >> percentage = 0.1;
 >> do_debug = 0;
 >> ppm = ParforProgressStarter2(s, n, percentage, do_debug, run_javaaddpath)
 >> for j = 1 : n
 >>     your_computation();
 >>     ppm.increment(i); 
 >> end
 >> delete(ppm);

In case this doesn't work, could you please post the Matlab output showing which globals disappear after running ParforProgressStarter2? Thanks!

mtompkins commented 10 years ago

@kotowicz Thanks for the post. No I do not use the run_java* declaration. I will add that and advise results. Much appreciated. (Sample of how I call it below before changes)

btw - LOVE this little baby!

-- code --

try
    ppm = ParforProgressStarter2('Parametric Sweep: RSI with RAVI Transformer', row, 0.1);
catch me % make sure "ParforProgressStarter2" didn't get moved to a different directory
    if strcmp(me.message, 'Undefined function or method ''ParforProgressStarter2'' for input arguments of type ''char''.')
        error('ParforProgressStarter2 not in path.');
    else
        % this should NEVER EVER happen.
        msg{1} = 'Unknown error while initializing "ParforProgressStarter2":';
        msg{2} = me.message;
        print_error_red(msg);
        % backup solution so that we can still continue.
        ppm.increment = nan(1, nbr_files);
    end
end

try
    parfor ii = 1:row
        [~,~,shTest(ii)] = rsiRaviSIG_mex(vBarsTest,[x(ii,1) x(ii,2)],x(ii,3),x(ii,4),...
            x(ii,5),x(ii,6),x(ii,7),x(ii,8), x(ii,9), x(ii,10),...
            bigPoint,cost,scaling);
        [~,~,shVal(ii)] = rsiRaviSIG_mex(vBarsVal,[x(ii,1) x(ii,2)],x(ii,3),x(ii,4),...
            x(ii,5),x(ii,6),x(ii,7),x(ii,8),x(ii,9), x(ii,10),...
            bigPoint,cost,scaling); %#ok<PFBNS>
        ppm.increment(); %#ok<PFBNS> % update progressbar
    end; %parfor
catch ER
    % An error occurred during the parallel process ...
    subj = 'ALERT - Notice of MATLAB error';
    message = sprintf('Error while executing the parallel parametric sweep.\nThe error reported by MATLAB is:\n\n%s', ER.message);
    % Send the notice
    sendNotice(subj, message);
end
mtompkins commented 10 years ago

Ok - unfortunately same result after adding the additional parameter.

It is a bit useless to show you the error output as the global variable that is dropped isn't used in this function.

On a thought, I added the global variable to the script that runs PPBar2 so that it exists in the same workspace and test if it might be preserved as follows:

% Prevent loss of global variables when calling parforprogress
global sendTo; %#ok<NUSED>
run_javaaddpath = 0;

I put a breakpoint at the run_java* declaration to confirm the global had a value and it does...then let the script continue.

The failure then produces the error as the variable 'sendTo' is dropped.

kotowicz commented 10 years ago

2 things:

mtompkins commented 10 years ago

v2013a

'a' comes back undefined / dropped.

Testing with empty parfor same result. Testing with for loop same result.

kotowicz commented 10 years ago

please set a break point in line 115, 117, 120 and 122. Then re-run the entire loop. Keep track of the break points where execution stops. Once you hit a break point, resume operation by typing dbcont. Finally, please post the line numbers here. thanks.

mtompkins commented 10 years ago

What is it you want from these breakpoints please?

mtompkins commented 10 years ago

breaks on 115 [type dbcont]

break on 117 [command windows displays] 117 pctRunOnAll(['addpath(''' dir_to_add ''')']);

[type dbcont] execution completes with dropped variables

.... hope that helps.

kotowicz commented 10 years ago

please download an edited version of ParforProgressStarter2.m from here: https://gist.github.com/kotowicz/8825641

run your loop again, and post the "@kotowicz: " output that you get. thanks.

kotowicz commented 10 years ago

if it breaks at line 115, then run_javaaddpath must be set to 1. please make sure it's set to 0 and try again.

mtompkins commented 10 years ago

OK that was extremely helpful (and I sadly admit I'm likely heavily to blame for my woes after your initial info). I may have missed a change in the input parameters from a version or just have been sloppy.

In any event, I was calling

ppm = ParforProgressStarter2('Parametric Sweep: maRsiPARMETS', row, 0.1);

When adding the run_java* as explained I never submitted the debug variable. If I add that and the run_java* everything completes properly. Apologies for the mea culpa where I'm to blame. Hopefully this bit of thread helps others.