radical-cybertools / ExTASY

MDEnsemble
Other
1 stars 1 forks source link

How to ensure a job finished successfully #244

Closed ebreitmo closed 8 years ago

ebreitmo commented 8 years ago

How do I ensure that the workflow worked properly? There are output files suggested in http://extasy-workflows.readthedocs.org/en/latest/pages/grlsd.html but they are not there and I don’t know any more if this docu is anything to go by during the development stage.

Using integrator=sd the run finishes looking ok

 python extasy_MISTgromacs_lsdmap.py --RPconfig archer.rcfg --Kconfig gromacslsdmap.wcfg

================================================================================
 EnsembleMD (0.3.14-27-g65bc062)                                                
================================================================================

Starting Allocation                                                           ok
Verifying pattern                                                             ok
Starting pattern execution                                                    ok
--------------------------------------------------------------------------------
Executing simulation-analysis loop with 1 iterations on 24 allocated core(s) on 'epsrc.archer'

Job waiting on queue...
Job is now running !
Waiting for pre_loop step to complete.                                      done
Iteration 1: Waiting for 8 simulation tasks: custom.gromacs to complete     done
Iteration 1: Waiting for analysis tasks: custom.pre_lsdmap to complete      done
Iteration 1: Waiting for analysis tasks: md.lsdmap to complete              done
Iteration 1: Waiting for analysis tasks: custom.post_lsdmap to complete     done
--------------------------------------------------------------------------------
Pattern execution successfully finished                                         

Starting Deallocation..
Resource allocation cancelled.                        
vivek-bala commented 8 years ago

I think I can add a method to check for specific files on the remote machine before proceeding. Do you see any errors in the tasks (please check the STDERR in the unit folders)?

vivek-bala commented 8 years ago

Two methods: 1) I have added a method (exists_remote) to kernels which would search for specified output files once execution is successful. Example,

k = Kernel(name="md.gromacs")
k. arguments = [.....]
k.link_input_data = [....]
k.exists_remote = ["out.gro","ener.edr"]

2) As we have seen in few instances, in the case of wrappers the error might not get reported + output file gets generated (with partial data or is just empty). If you use wrappers (such as the case in non-adaptive/static grlsd usecase ), the wrapper should also propagate/report if there are any errors.

vivek-bala commented 8 years ago

There was another problem with the run.py file (but I don't think you faced it). I do get the output generated on local machine. Could you share the enmd script that you are using ?

ebreitmo commented 8 years ago

what is my enmd script?

more gromacslsdmap.wcfgc "gromacslsdmap.wcfgc" may be a binary file. See it anyway? (04Feb)mbp-eb:grlsd-on-archer elenabreitmoser$ (04Feb)mbp-eb:grlsd-on-archer elenabreitmoser$ more gromacslsdmap.wcfg

-------------------------Applications----------------------

simulator = 'Gromacs' # Simulator to be loaded analyzer = 'LSDMap' # Analyzer to be loaded

--------------------------General--------------------------------

num_CUs = 8 # Number of tasks or Compute Units num_iterations = 1 # Number of iterations of Simulation-Analysis start_iter = 0 # Iteration number with which to start nsave = 1 # # Iterations after which output is transfered to local machine

--------------------------Simulation--------------------------------

md_input_file = './inp_files/input.gro' # Entire path to the MD Input file - Do not use $HOME or the likes mdp_file = './inp_files/grompp.mdp' # Entire path to the MD Parameters file - Do not use $HOME or the likes top_file = './inp_files/topol.top' # Entire path to the Topology file - Do not use $HOME or the likes mist_file = './inp_files/mist.params' ndx_file = None # Entire path to the Index file - Do not use $HOME or the likes grompp_options = None # Command line options for when grompp is used mdrun_options = None # Command line options for when mdrun is used md_output_file = 'tmp.gro' # Filename to be used for the simulation output

--------------------------Analysis----------------------------------

lsdm_config_file = './inp_files/config.ini' # Entire path to the LSDMap configuration file - Do not use $HOME or the likes num_runs = 100 # Number of runs to be performed in the Selection step in Analysis w_file = 'weight.w' # Filename to be used for the weight file max_alive_neighbors = '10' # Maximum alive neighbors to be considered while reweighting max_dead_neighbors = '1' # Maximum dead neighbors to be considered while reweighting

--------------------------Misc----------------------------------

helper_scripts = './helper_scripts'

Cheers, Elena


Dr Elena Breitmoser

EPCC, University of Edinburgh JCMB, Room 3401 Peter Guthrie Tait Road UK-Edinburgh EH9 3FD

Tel: +44 131 650 6494

On 25 Feb 2016, at 18:00, Vivekanandan (Vivek) Balasubramanian notifications@github.com wrote:

There was another problem with the run.py file (but I don't think you faced it). I do get the output generated on local machine. Could you share the enmd script that you are using ?

— Reply to this email directly or view it on GitHub.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

ibethune commented 8 years ago

Just a comment, when I ran (using the same version of ensemblemd code as you - please also check the RP version) I got some output files, which I think are correct according to the documentation (almost except the directory is named backup instead of output. Maybe it's an issue with the fact that you have an already-existing directory?

(extasy-test)Iains-MBP:grlsd-on-archer ibethune$ radicalpilot-version 
v0.40-2-g4671811@devel
(extasy-test)Iains-MBP:grlsd-on-archer ibethune$ python extasy_gromacs_lsdmap.py --Kconfig gromacslsdmap.wcfg --RPconfig archer.rcfg

================================================================================
 EnsembleMD (0.3.14-27-g65bc062)                                                
================================================================================

Starting Allocation                                                           ok
Verifying pattern                                                             ok
Starting pattern execution                                                    ok
--------------------------------------------------------------------------------
Executing simulation-analysis loop with 1 iterations on 24 allocated core(s) on 'epsrc.archer'

Job waiting on queue...
Job is now running !
Waiting for pre_loop step to complete.                                      done
Iteration 1: Waiting for 8 simulation tasks: custom.gromacs to complete     done
Iteration 1: Waiting for analysis tasks: custom.pre_lsdmap to complete      done
Iteration 1: Waiting for analysis tasks: md.lsdmap to complete              done
Iteration 1: Waiting for analysis tasks: custom.post_lsdmap to complete     done
--------------------------------------------------------------------------------
Pattern execution successfully finished                                         

Starting Deallocation..
Resource allocation cancelled.                                              done 
(extasy-test)Iains-MBP:grlsd-on-archer ibethune$ ls backup/
iter0   iter1
(extasy-test)Iains-MBP:grlsd-on-archer ibethune$ ls backup/iter0
lsdmap.log
(extasy-test)Iains-MBP:grlsd-on-archer ibethune$ ls backup/iter1
out.gro     weight.w
ebreitmo commented 8 years ago
v0.40.1@devel
ibethune commented 8 years ago

Hmm, for me it does overwrite successfully...

ibethune commented 8 years ago

And I see in the scripts that you now get via the download from the bitbucket repo that the folder is correctly called 'output' rather than 'backup'...

vivek-bala commented 8 years ago

Sorry, could you share the extasy_MISTgromacs_lsdmap.py file please. Preferably, putting it in gist (https://gist.github.com/) and sharing the link (better for long codes/outputs).

ebreitmo commented 8 years ago

https://gist.github.com/ebreitmo/e3ee8f393cf3ad0fe3b9

and

(04Feb)mbp-eb:grlsd-on-archer elenabreitmoser$ ls -lrt
total 144
-rw-r--r--   1 elenabreitmoser  staff  13334 31 Jan 05:33 extasy_gromacs_lsdmap.py
-rw-r--r--   1 elenabreitmoser  staff    674  4 Feb 09:20 archer.rcfg~
drwxr-xr-x   4 elenabreitmoser  staff    136 15 Feb 15:23 backup
drwxr-xr-x  15 elenabreitmoser  staff    510 15 Feb 16:08 kernel_defs
-rw-r--r--   1 elenabreitmoser  staff  13385 15 Feb 16:12 extasy_MISTgromacs_lsdmap.py~
drwxr-xr-x  12 elenabreitmoser  staff    408 22 Feb 14:45 helper_scripts
-rw-r--r--   1 elenabreitmoser  staff  13485 23 Feb 12:05 extasy_MISTgromacs_lsdmap.py
-rw-r--r--   1 elenabreitmoser  staff   2210 24 Feb 09:05 gromacslsdmap.wcfg~
-rw-r--r--   1 elenabreitmoser  staff   2211 25 Feb 13:07 gromacslsdmap.wcfg
-rw-r--r--   1 elenabreitmoser  staff    671 25 Feb 13:07 archer.rcfg
drwxr-xr-x  13 elenabreitmoser  staff    442 25 Feb 13:07 inp_files
-rw-r--r--   1 elenabreitmoser  staff    829 25 Feb 13:08 gromacslsdmap.wcfgc
-rw-r--r--   1 elenabreitmoser  staff    400 25 Feb 13:08 archer.rcfgc
vivek-bala commented 8 years ago

In line 24-206, https://gist.github.com/ebreitmo/e3ee8f393cf3ad0fe3b9#file-extasy_mistgromacs_lsdmap-py-L204-L206, you download the .gro file and weight files back to the local machine. Do you have them in the 'backup' folder ? (I think you might be using an older enmd script since the folder name 'backup' got changed to 'output')

ebreitmo commented 8 years ago

I started from scratch. Now I have locally

ls -lrt output/
total 0
drwxr-xr-x  3 elenabreitmoser  staff  102  1 Mar 12:07 iter0
drwxr-xr-x  4 elenabreitmoser  staff  136  1 Mar 12:07 iter1
(01Mar)mbp-eb:grlsd-on-archer elenabreitmoser$ ls -lrt output/iter1/
total 432
-rw-------  1 elenabreitmoser  staff  215912  1 Mar 12:07 out.gro
-rw-------  1 elenabreitmoser  staff    2192  1 Mar 12:07 weight.w
(01Mar)mbp-eb:grlsd-on-archer elenabreitmoser$ ls -lrt output/iter0/
total 8
-rw-------  1 elenabreitmoser  staff  389  1 Mar 12:07 lsdmap.log