radical-cybertools / ExTASY

MDEnsemble
Other
1 stars 1 forks source link

When running something like LSDmap MD, how does one set a runtime longer than the max walltime? #221

Closed dotsdl closed 7 years ago

dotsdl commented 8 years ago

I would like to run LSDmap MD runs on stampede for several days (10 or more), but I know that the max walltime for any one job is 48 hours on the normal CPU queue. Typically when I want to do this with gromacs I submit several dependent jobs, with each one picking up where the previous one left off.

Is there a mechanism for running LSDmap MD runs over several jobs/pilots? If so, what's the best way of doing it? If not, what's the best place to begin hacking it together? I'm familiar with radical.pilot enough to work at the low level of pilot jobs, but I'm not as familiar with how much that plumbing is exposed within radical.ensemblemd.

andre-merzky commented 8 years ago

Not really on topic, but anyway: please be aware that RP has a long standing issue with keeping connections to MongoDB open for long periods of time (see https://github.com/radical-cybertools/radical.pilot/issues/662). If you happen to stumble over this, please let us know, that should be enough motivation for us to finally address the issue :P

vivek-bala commented 8 years ago

I would like to run LSDmap MD runs on stampede for several days (10 or more), but I know that the max walltime for any one job is 48 hours on the normal CPU queue. Typically when I want to do this with gromacs I submit several dependent jobs, with each one picking up where the previous one left off.

You should be able to do the same with EnMD as well. 10 days translates to 5 pilots for 48 hrs each. You would create 5 SingleClusterEnvironment objects which allocate, run, deallocate sequentially. The main/important point here is 1) your executable has some checkpoint mechanism which creates a backup every (say) 30 mins, 2) to move the data to a known location at the end of each pilot - this can be possibly be done with executable again (or a wrapper script).

@andre-merzky For such long jobs, is it possible to move the data from the CUs when the pilot hits the walltime (and hence the CUs get cancelled) as opposed to staging out when the CUs are Done. I don't think this exists currently, but do you think it would be possible ?

Is there a mechanism for running LSDmap MD runs over several jobs/pilots? If so, what's the best way of doing it? If not, what's the best place to begin hacking it together? I'm familiar with radical.pilot enough to work at the low level of pilot jobs, but I'm not as familiar with how much that plumbing is exposed within radical.ensemblemd.

I am not sure if LSDMap has checkpoint mechanism that writes output (/creates backup) at regular intervals. I'll let the LSDMap team answer that though.

andre-merzky commented 8 years ago

Hey Vivek -- IIRC, Antons also requested something like this in the past, to perform staging on failing / canceled units. In principle I don't mind, in practice we are running into a race condition here: if the unit is canceled because the pilot dies, then the pilot is by definition gone, and we can't rely on it to do any staging anymore. The staging ops done by the client module would still work, but I feel somewhat uncomfortable defining some semantics (staging after cancel) which I know will fail in some cases...

Anyway, I do see the use case. Let me check on what exists in terms of staging on cancel (I remember to have added code at some point...).

dotsdl commented 8 years ago

@andre-merzky thanks for the heads up! I'll hopefully not trip over that, but I know where to go if I do. :D

@vivek-bala although I think putting cluster allocation and execution inside a for loop should work, does this actually pick up where the last run left off? As in, given the set of files the SingleClusterEnvironment ends with, does starting another one up cleanly take up where the last one left off? I'm thinking in particular how in the md.pre_grlsd_loop Kernel spliter.py looks like it's fed a single gro file, but in the case of restart it will end up with one of the concatenated gro files. Will it handle this cleanly?

I'm running tests now to see if this all plays well. I don't normally ask questions without first trying something out, but there's a lot of moving parts in here. :D

vivek-bala commented 8 years ago

Hey Vivek -- IIRC, Antons also requested something like this in the past, to perform staging on failing / canceled units. In principle I don't mind, in practice we are running into a race condition here: if the unit is canceled because the pilot dies, then the pilot is by definition gone, and we can't rely on it to do any staging anymore. The staging ops done by the client module would still work, but I feel somewhat uncomfortable defining some semantics (staging after cancel) which I know will fail in some cases...

I see.. I guess we can't do via the agent. But since all the paths are in the session.. I think we can maybe open a process on remote to do the necessary file movements ?

@dotsdl No.. every new SingleClusterEnvironment will be a new pilot and hence a new folder. At the end of the 1st pilot, you can stage out all the files to the (say) work directory ../work/pilot1/outputs. At the beginning of the 2nd pilot, you can those files (of the prev pilot) to the current working directory. There is a common staging area within each pilot (for each task), but you will have to create a "staging area" for sharing work across pilots. Creating a "staging area" across pilots would require additional functionality in EnMD that has knowledge of multiple pilots, currently that does not exist.

I'm thinking in particular how in the md.pre_grlsd_loop Kernel spliter.py looks like it's fed a single gro file, but in the case of restart it will end up with one of the concatenated gro files. Will it handle this cleanly?

The spliter.py file basically splits a .gro file with lots of identical configurations into .gro files with smaller number of identical configurations. You can download (or copy to the "staging area") the output of the analysis step which is also a .gro file with lots of different configurations (the last stage of the analysis step is again running the spliter.py file: https://github.com/radical-cybertools/ExTASY/blob/master/examples/gromacs_lsdmap/helper_scripts/post_analyze.py#L53). Passing this on to the md.pre_grlsd_loop kernel in again should work.

This is good ! We haven't really played with such long running jobs, so we haven't really stressed much along those lines. Keep shooting the questions :)

andre-merzky commented 8 years ago

Alas, there is no code for transfer-on-unit-cancel in place, yet :/