payu-org / payu

A workflow management tool for numerical models on the NCI computing systems
Apache License 2.0
18 stars 26 forks source link

use more than 3 digits for outputnnn and restartnnn? #232

Open aekiss opened 4 years ago

aekiss commented 4 years ago

The output and restart directories are distinguished by a 3-digit run number, but we are getting uncomfortably close to running out of digits in the long 0.1 deg ACCESS-OM2 runs (we would run out of digits after 250 years at 3mo/submit). So I'm wondering how hard-coded that is, and whether it could easily be increased?

aekiss commented 4 years ago

e.g. @andyhoggANU's run /g/data/ik11/outputs/access-om2-01/01deg_jra55v13_ryf9091 is up to output688 so far...

marshallward commented 4 years ago

Has anyone actually tried this before? I recall it will output more digits if needed, it will just lack the padding.

>>> counter = 1000
>>> restart_dir = 'restart{0:03}'.format(counter)
>>> print(restart_dir)
restart1000
aekiss commented 4 years ago

Some workflows I've created rely on alphabetic sorting, so the lack of zero-padding would be a problem for that.

But if payu is adjusted to use 4 digits it would need to initially be optional so it doesn't mess up the naming in existing runs.

Maybe it's not worth worrying about, since we are yet to even exceed this 3-digit limit?

marshallward commented 4 years ago

It should be a quick job to replace the hard-coded 3 with a variable whose default value is 3.

marshallward commented 3 years ago

Another solution is to allow the user to define their own format string, with 0:03 being the default (or even restart{0:03} depending on how ambitious we are feeling...)

aidanheerdegen commented 3 years ago

I noticed Adele had a run with run numbers > 999 and it seemed to work fine apart from ls not sorting correctly without using an option like -v or -t

$ls -v /scratch/v45/akm157/access-om2/archive/01deg_jra55v13_ryf9091_weddell_up1
error_logs  restart998   restart1015  restart1032  restart1049
output995   restart999   restart1016  restart1033  restart1050
output1055  restart1000  restart1017  restart1034  restart1051
output1056  restart1001  restart1018  restart1035  restart1052
output1057  restart1002  restart1019  restart1036  restart1053
output1058  restart1003  restart1020  restart1037  restart1054
output1059  restart1004  restart1021  restart1038  restart1055
output1060  restart1005  restart1022  restart1039  restart1056
output1061  restart1006  restart1023  restart1040  restart1057
output1062  restart1007  restart1024  restart1041  restart1058
output1063  restart1008  restart1025  restart1042  restart1059
output1064  restart1009  restart1026  restart1043  restart1060
output1065  restart1010  restart1027  restart1044  restart1061
pbs_logs    restart1011  restart1028  restart1045  restart1062
restart995  restart1012  restart1029  restart1046  restart1063
restart996  restart1013  restart1030  restart1047  restart1064
restart997  restart1014  restart1031  restart1048  restart1065