jsherfey / dnsim

Dynamic Neural Simulator - a modular modeling tool for large ODE systems.
3 stars 3 forks source link

Jobs run on cluster with identical end filenames (regardless of path) overwrite each other if sent to cluster within a small amount of time #46

Open asoplata opened 9 years ago

asoplata commented 9 years ago

(Before I begin, I only later realized I could use {('TC,RE')},{'square_amp''},{'[0.1,0.3]'} to do the same simulations, as you said in your chat -- so that IS working, but the below is still a bug), So, in order to get "groups" working, I set a bunch of sims yesterday running things like

[specs,timestamp,rootoutdir]=simstudy(spec, {'TC','RE'},{'square_amp','square_amp'},{'[0.1]','[0.1]'},'plotvars_flag',plotvars_flag,'plot_flag',plot_flag,'plotpower_flag',plotpower_flag,'plotpacoupling_flag',plotpacoupling_flag,'saveplot_flag',saveplot_flag,'overwrite_flag',overwrite_flag,'timelimits',timelimits,'SOLVER',SOLVER,'rootdir',rootdir,'addpath',addpath,'cluster_flag',cluster_flag,'savedata_flag',savedata_flag);
[specs,timestamp,rootoutdir]=simstudy(spec, {'TC','RE'},{'square_amp','square_amp'},{'[0.3]','[0.3]'},'plotvars_flag',plotvars_flag,'plot_flag',plot_flag,'plotpower_flag',plotpower_flag,'plotpacoupling_flag',plotpacoupling_flag,'saveplot_flag',saveplot_flag,'overwrite_flag',overwrite_flag,'timelimits',timelimits,'SOLVER',SOLVER,'rootdir',rootdir,'addpath',addpath,'cluster_flag',cluster_flag,'savedata_flag',savedata_flag);

<change some synaptic conductances or whatever, along with the rootdir so it's saving to a different folder ultimately>

``` [specs,timestamp,rootoutdir]=simstudy(spec, {'TC','RE'},{'square_amp','square_amp'},{'[0.1]','[0.1]'},'plotvars_flag',plotvars_flag,'plot_flag',plot_flag,'plotpower_flag',plotpower_flag,'plotpacoupling_flag',plotpacoupling_flag,'saveplot_flag',saveplot_flag,'overwrite_flag',overwrite_flag,'timelimits',timelimits,'SOLVER',SOLVER,'rootdir',rootdir,'addpath',addpath,'cluster_flag',cluster_flag,'savedata_flag',savedata_flag); [specs,timestamp,rootoutdir]=simstudy(spec, {'TC','RE'},{'square_amp','square_amp'},{'[0.3]','[0.3]'},'plotvars_flag',plotvars_flag,'plot_flag',plot_flag,'plotpower_flag',plotpower_flag,'plotpacoupling_flag',plotpacoupling_flag,'saveplot_flag',saveplot_flag,'overwrite_flag',overwrite_flag,'timelimits',timelimits,'SOLVER',SOLVER,'rootdir',rootdir,'addpath',addpath,'cluster_flag',cluster_flag,'savedata_flag',savedata_flag); ``` I wake up this morning, to realize that only about 50-66% of them ran/were saved. Seriously. It's almost as if, though not exactly, only every simulation was run, but that pattern isn't always true. So I look at '~/batchdirs' entries, and the missing simulations are straight up missing - as if they were never sent to the cluster. So I look at my standard output from when I was submitting them all, and I notice something I hadnt paid much attention to at the time: sometimes, when I would do 'matlab -r run_script' with a bunch of the above simulations, go in and change something like a synaptic conductance, and do 'matlab -r run_script' again. In the second script run, I noticed this as an example ``` ... /project/.../20150401:: job0001_RE-squareamp0pt1__TC-squareamp0pt1_time0-2000 executing: "qmatjobs_memlimit B20150401-100547 8G" on cluster scc2.bu.edu 1 jobs submitted. /project/.../20150401:: job0001_RE-squareamp0pt1__TC-squareamp0pt1_time0-2000 Warning: Directory already exists. >In simstudy at 191 >In at 86 executing: "qmatjobs_memlimit B20150401-100547 8G" on cluster scc2.bu.edu 1 jobs submitted. ``` emphasis on the job number. Whenever this warning appeared, it would say executing the same job number twice, which appears to overwrite whatever was the job before. So the [0.1] job would be overwritten by the [0.3] job, and the [0.1] job would never be properly run (since the time to write and submit the job is far less than the time until the job finishes). This also means that apparently you can overwrite your jobs to the cluster this way, as that's what I think is going on here. You can test it yourself by making 4 changes to something like the synaptic conductance, interspersed with running the exact same simstudy call, run one right after another. EVEN if you change the 'rootdir' so each of the resulting data is saved into different folders, this STILL happens. Therefore it appears that, regardless of the resulting, absolute data directory like `rootdir`, if you submit a bunch of jobs where each of say the resulting filenames are identical, like `job0001_RE-squareamp0pt3__TC-squareamp0pt3_time0-15000_rawV.png` (EVEN if in different directories), then only sometimes (there is some randomness involved it seems) the previous job is overwritten. Very cheap solution I've been testing out: simply waiting between the simstudy calls, via a `pause(5)`, seems to make the problem go away! That said, this bug should still be fixed.
asoplata commented 9 years ago

Whoops, the above first code blocks should read

[specs,timestamp,rootoutdir]=simstudy(spec, {'TC','RE'},{'square_amp','square_amp'},{'[0.1]','[0.1]'},'plotvars_flag',plotvars_flag,'plot_flag',plot_flag,'plotpower_flag',plotpower_flag,'plotpacoupling_flag',plotpacoupling_flag,'saveplot_flag',saveplot_flag,'overwrite_flag',overwrite_flag,'timelimits',timelimits,'SOLVER',SOLVER,'rootdir',rootdir,'addpath',addpath,'cluster_flag',cluster_flag,'savedata_flag',savedata_flag);
[specs,timestamp,rootoutdir]=simstudy(spec, {'TC','RE'},{'square_amp','square_amp'},{'[0.3]','[0.3]'},'plotvars_flag',plotvars_flag,'plot_flag',plot_flag,'plotpower_flag',plotpower_flag,'plotpacoupling_flag',plotpacoupling_flag,'saveplot_flag',saveplot_flag,'overwrite_flag',overwrite_flag,'timelimits',timelimits,'SOLVER',SOLVER,'rootdir',rootdir,'addpath',addpath,'cluster_flag',cluster_flag,'savedata_flag',savedata_flag);
[specs,timestamp,rootoutdir]=simstudy(spec, {'TC','RE'},{'square_amp','square_amp'},{'[0.1]','[0.1]'},'plotvars_flag',plotvars_flag,'plot_flag',plot_flag,'plotpower_flag',plotpower_flag,'plotpacoupling_flag',plotpacoupling_flag,'saveplot_flag',saveplot_flag,'overwrite_flag',overwrite_flag,'timelimits',timelimits,'SOLVER',SOLVER,'rootdir',rootdir,'addpath',addpath,'cluster_flag',cluster_flag,'savedata_flag',savedata_flag);
[specs,timestamp,rootoutdir]=simstudy(spec, {'TC','RE'},{'square_amp','square_amp'},{'[0.3]','[0.3]'},'plotvars_flag',plotvars_flag,'plot_flag',plot_flag,'plotpower_flag',plotpower_flag,'plotpacoupling_flag',plotpacoupling_flag,'saveplot_flag',saveplot_flag,'overwrite_flag',overwrite_flag,'timelimits',timelimits,'SOLVER',SOLVER,'rootdir',rootdir,'addpath',addpath,'cluster_flag',cluster_flag,'savedata_flag',savedata_flag);
asoplata commented 9 years ago

Note: this bug may be confounded with #48 .