Closed marshallward closed 4 years ago
Hello @marshallward! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
I have not yet tested this on Gadi, so it may fail spectacularly on there. Will try to get it going...
Already found one bug:
539 #model_prog.append('-np {0}'.format(model_ncpus))
540 model_prog.append('-n {0}'.format(model_ncpus))
(or maybe that works? mpirun appears to support both...)
I don't understand why coveralls says coverage reduced by 42%.
A few thoughts:
With the site specific hacks required for both GFDL and NCI perhaps it would be best to create clean scheduler classes and put the site specific stuff into some site/platform class. It could contain methods to modify the mpirun
command, as well as queue information maybe?
This is creating code that I can't test, as I don't have access to a slurm machine, and is more difficult for you. Might be time for me to resuscitate my Pawsey account ...
Some of the recent code reorganisation was to make it easier to write tests for the code. So for example have a functions that returns a command string that can be checked for correctness rather than submitting that directly. In case it wasn't obvious why I'd made those changes.
I don't understand either what's going on with code coverage, but will try to see what's happening. (So much has changed, I don't recognize the output much anymore...)
I agree, we should try to nail down these site-specific hacks. An NCI scheduler which subclasses PBS is not a bad idea IMO, for example. (Maybe revive the old "ANUPBS" name?)
For now, I think it's probably OK to just say "PBS" is synonymous with "NCI" and "Slurm" is synonymous with "GFDL".
I agree (again) and think the safest thing to do is for me not to merge anything until you've tested it out, or at least signed off on it. And no need to return the favor; at this time I'm the only GFDL user.
Yeah, the testing changes are a bit confusing to me at the moment (see first point), but hopefully I'll get my head around them soon.
I have an idea for making the site specific stuff a little tidier. How about moving to a system where all arguments to stuff like mpirun
and qsub
is a dict or ordereddict. Then that can be returned, and checked, really easily, and also modified in a very modular way. You can delete and replace options for your MPICH wrapper, I can add storage
options for our PBS system.
Rather than parsing stupid effing stringsI think this could be a winner.
I agree with much of this, and consider most of those string checks to be stubs.
I think that a lot of the proposed structure will emerge as I get the GFDL version working correctly.
This is a very minimal implementation of a Scheduler class, in order to handle scheduler-specific operations.
It includes a basic Factory for controlling the Scheduler instantiation, and a single method for returning the submission command line.
I do not expect things to keep working like this, it is only meant to be the most minimal implementation to support both PBS and Slurm.
Hopefully more changes to come.