ucgmsim / slurm_gm_workflow

Porting the GM workflow to run on new NeSI HPC (Maintainer: Jonney)
MIT License
0 stars 2 forks source link

A behaviour change to query_mgmt_db.py --mode todo #492

Closed sungeunbae closed 1 year ago

sungeunbae commented 1 year ago

query_mgmt_db.py --mode todo is supposed to give a list of jobs that still need to be done. I think it makes more sense if it shows everything that hasn't reached "completed" status.

sungeunbae commented 1 year ago

When running the whole cybershake, I seriously wanted to have a way to see what more jobs need to be completed. --mode todo only showing "created" jobs is not particularly useful in my opinion.
Can you describe a situation where we want to see the list of untouched created jobs only?

I still feel "todo" means jobs, regardless of the current status, that still need work

sungeunbae commented 1 year ago

Updated the query logic. Without todo mode, you will see

...
               MS04_REL29 |          EMOD3D |     failed | 13251611 |  2023-07-19 01:12:08
               MS04_REL29 |          EMOD3D |     failed | 13336848 |  2023-08-08 21:40:48
               MS04_REL30 |          EMOD3D |     failed | 13251613 |  2023-07-19 01:12:08
               MS04_REL30 |          EMOD3D |     failed | 13336849 |  2023-08-08 21:40:48
               MS09_REL01 |          EMOD3D |  completed | 13336850 |  2023-08-08 21:40:48
               MS09_REL01 |          EMOD3D |     failed | 13251615 |  2023-07-19 01:12:08
               MS09_REL02 |          EMOD3D |  completed | 13336852 |  2023-08-08 21:40:48
               MS09_REL02 |          EMOD3D |     failed | 13251617 |  2023-07-19 01:12:08
               MS09_REL03 |          EMOD3D |  completed | 13336853 |  2023-08-08 21:40:48
               MS09_REL03 |          EMOD3D |     failed | 13251619 |  2023-07-19 01:12:08
...
               MS09_REL21 |          EMOD3D |  completed | 13336879 |  2023-08-08 21:40:48
               MS09_REL21 |          EMOD3D |     failed | 13280910 |  2023-07-27 22:53:35
               MS09_REL22 |          EMOD3D |     failed | 13280912 |  2023-07-27 22:53:35
               MS09_REL22 |          EMOD3D |     failed | 13336881 |  2023-08-08 21:40:48
               MS09_REL23 |          EMOD3D |  completed | 13336882 |  2023-08-08 21:40:48
               MS09_REL23 |          EMOD3D |     failed | 13280915 |  2023-07-27 23:04:33
...
         UpperSlope_REL20 |          EMOD3D |    created |     None |  2023-08-07 01:40:09
         UpperSlope_REL20 |          EMOD3D |     failed | 13323625 |  2023-08-06 21:14:48
         UpperSlope_REL21 |          EMOD3D |    created |     None |  2023-08-07 01:40:09
         UpperSlope_REL21 |          EMOD3D |     failed | 13323626 |  2023-08-06 21:14:48
         UpperSlope_REL22 |          EMOD3D |    created |     None |  2023-08-07 01:40:09
         UpperSlope_REL22 |          EMOD3D |     failed | 13323627 |  2023-08-06 21:14:48

With todo mode, you will no longer see a (rel,process_type) combo that has completed somehow.

...
               MS04_REL29 |          EMOD3D |     failed | 13336848 |  2023-08-08 21:40:48
               MS04_REL30 |          EMOD3D |     failed | 13336849 |  2023-08-08 21:40:48
               MS09_REL22 |          EMOD3D |     failed | 13336881 |  2023-08-08 21:40:48
...
         UpperSlope_REL20 |          EMOD3D |    created |     None |  2023-08-07 01:40:09
         UpperSlope_REL21 |          EMOD3D |    created |     None |  2023-08-07 01:40:09
         UpperSlope_REL22 |          EMOD3D |    created |     None |  2023-08-07 01:40:09

It shows the latest non-complete status of each task - We have just one entry per (rel,process_type) combo. This should give a better overview of the what jobs remain to be done.