payu-org / payu

A workflow management tool for numerical models on the NCI computing systems
Apache License 2.0
19 stars 26 forks source link

Scan control path for storage flags #296

Closed aidanheerdegen closed 3 years ago

aidanheerdegen commented 3 years ago

Scan control path for storage flags. Closes #264

pep8speaks commented 3 years ago

Hello @aidanheerdegen! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2021-02-12 05:51:10 UTC
coveralls commented 3 years ago

Coverage Status

Coverage increased (+0.8%) to 42.189% when pulling 2886517442f99d67ccf5c2c3a3ebd6169493ec41 on issue-264 into a344529d7bdeff5144d504af51d044f15dd72b4a on master.

aidanheerdegen commented 3 years ago

This was far too complicated for a simple enough end result.

@marshallward can you check you're happy with pushing defining the default control path into readconfig?

I actually think it is a good idea to put as many default settings as possible in one place, as it is they are sprinkled around a bit.

aidanheerdegen commented 3 years ago

I've tested it, seems to work ok, so I'm going to merge.

marshallward commented 3 years ago

Sorry that I was unable to review this before merge.

I'm not too familiar with what storages is all about, but I can't see too much harm in adding the control path to the list.

(I suppose it may encourage more binaries in control, which could break the paradigm of a lean control directory, but no doubt there's a genuine need here so no matter.)

Does control_path still default back to the current working directory? I can't really tell from the patch. (Perhaps it doesn't matter?)

And it looks like PBS.generate_command() got absorbed into PBS.submit(). I guess both were hanging around?

aidanheerdegen commented 3 years ago

Yeah control_path defaults to the directory in which the config.yaml resides. So in the case of running payu from the control directory it reproduces the previous behaviour. But it doesn't do a getcwd, instead it takes the abspath of the config.yaml and then uses that directory:

https://github.com/payu-org/payu/blob/2886517442f99d67ccf5c2c3a3ebd6169493ec41/payu/fsops.py#L95-L97

Now it has just occurred to me that payu has a --config command line option which that change might break?

  --config CONFIG_PATH, -c CONFIG_PATH
                        Configuration file path

Was the point of that option that you could run multiple experiments with the same config.yaml but in separate control directories?

Bugger.

Yeah I removed PBS.generate_command() because it wasn't being called by anything. I'd separated the functionality for testing, and then I think you made some changes with the scheduler stuff that made it redundant. At least I think it wasn't used ... :)

marshallward commented 3 years ago

Don't sweat the --config flag, it was another one of those things I put in but no one ever used (or so I assume).

marshallward commented 3 years ago

(I think the point was to allow a generic YAML input. I always hated the name config.yaml, maybe this was a pathway to phasing it out.)

aidanheerdegen commented 3 years ago

I only changed it so it was easier for one test, and didn't seem to break anything, but now I'm wondering if I should change it back.

Regarding the -l storage stuff, it is because NCI now only mounts /home and /scratch/$PROJECT on PBS jobs, and all other /scratch and /g/data mounts you need access to have to be specified when the PBS job is submitted. This is a right royal PITA when, for example, people want to save data to a different /scratch area than the project code they are running with. This is very common with our researchers.

So now payu scans a bunch of paths and all the paths in the manifests to look for anything that looks like a /scratch or /g/data path, extracts the info it needs and then constructs a -l storage=... option for the PBS job.

This latest fix was for when people decided to have a control directory not in their /home, and also not in the default /scratch space. There was no way payu knew to add this to the storage flag. The job would submit ok, but then payu would complain that it couldn't read the config.yaml, which was quite confusing for users.