Week 6 action items - Githubissues

daipayans commented 3 years ago

Based on our meeting today, the following action items @lee212:

[x] MDFF/EnTK test three resolutions -
- [ ] 1.8 A
- [ ] 3 A
- [ ] 5 A
[x] Postprocessing validation to check if highest CC is the starting point for different replica
[x] Only for 1.8 A resolution, submit parallel jobs only changing the numsteps in workflow - [5000 and 100000]. Note as numstep increases computational resource in resource needs to change to ensure jobs complete within specified walltime.

daipayans commented 3 years ago

Code to launch MDFF/EnTK has been modified and verified.
1.8 A jobs for different iterations (1, 2, 10, 20) at 100000 steps per iteration are complete.
1.8 A jobs for different iterations (1, 2, 10, 20) at 5000 steps per iteration are in progress.

daipayans commented 3 years ago

For 20 iterations with 100000 steps per iteration, 1.8A map, 147/160 cc's were generated. @lee212 what could be a possible reason that we can track on radical log file?

daipayans commented 3 years ago

We need to tabulate stats of completed/submitted task? This will tell us what percentage of total job were completed by the scheduler. Is it already implemented or can we add this to the master log file for EnTK session? @lee212

lee212 commented 3 years ago

For 20 iterations with 100000 steps per iteration, 1.8A map, 147/160 cc's were generated. @lee212 what could be a possible reason that we can track on radical log file?

This behavior is odd, I can start to look at the vmd executable and see if there is a way to resolve this.

We need to tabulate stats of completed/submitted task? This will tell us what percentage of total job were completed by the scheduler. Is it already implemented or can we add this to the master log file for EnTK session? @lee212

STDOUT message prints out a message with stats of each task like

Update: MD_ML.MD.task.0019 state: DONE
or
Update: MD_ML.MD.task.0019 state: FAILED

, so tabulating the results is possible as long as an executable produces an exit code correctly. If it generates 0 which means successfully finished although there is an intermediate error and I can suggest using task.post_exec to verify its completion by adding a list of commands. for example:

task1.post_exec = [ 'if [ ! -s "cc.dat" ] ; then exit 1; fi' ]

daipayans commented 3 years ago

@lee212 This behavior is odd, I can start to look at the vmd executable and see if there is a way to resolve this.

The issue is not with VMD, I can assure you on that. This is because then it would be a problem across all replicas, which is not the case. The problem instead is unit.???? tasks are not created. Hence, my money is on the communication between EnTK and scheduler. Does that make sense?

lee212 commented 3 years ago

I am closing this as all tasks are complete

radical-collaboration / MDFF-EnTK

Week 6 action items #28

Code to launch MDFF/EnTK has been modified and verified.