payu-org / payu

A workflow management tool for numerical models on the NCI computing systems
Apache License 2.0
18 stars 26 forks source link

Requirement for storage mount points to be specified breaks payu from conda #215

Closed aidanheerdegen closed 4 years ago

aidanheerdegen commented 4 years ago

NCI have turned on the requirement to specify all gdata and scratch mount points explicitly in the qsub call. It does default to including the /scatch/$PROJECT mount point.

For those using payu from a conda environment under a /g/data mount point it means this gdata mount must be included in the qsub call or payu will fail as soon as the job is launched as the conda environment will not be accessible.

A work-around is to include:

qsub_flags: -l storage=gdata/hh5

in the config.yaml of an experiment.

I am looking into code changes to support this. An immediate thought is to inspect the path to payu and add any gdata or scratch mount point found.

This would be very very NCI specific, which this tool is, but also moves further away from the goal of making this as general as possible.

I would also like to add code to inspect include and executable paths and add mount points as appropriate.

Another option is to just add every possible mount point a user can access automatically to every qsub, but that seems a rather crude approach.

marshallward commented 4 years ago

I think platform-specific stuff like this is inevitable, seems fine to me.

Maybe the easiest thing is to just add a storage: record to config.yaml and make it mandatory on NCI?

aidanheerdegen commented 4 years ago

Good idea. A storage flag would make it easier to specify, could do something like

storage:
      - gdata:
              - ik11
              - v45
      - scratch:
              - x77

to make it easier to specify.

My main issue with having to specify this stuff independently is that it is error prone and annoying.

Ideally payu should parse all the input, exe and laboratory paths and extract the necessary storage information and add it automatically.

It is easy to do this for the python interpreter path, and the path to payu itself (in the case of the conda environment, this is the same root path). Doing it for the other paths, input etc, requires payu to work differently than it does currently. At the moment payu run only parses config.yaml for the PBS information necessary to submit the job.

Obtaining all the path information would either require duplicating code to extract path information from config.yaml, or using the existing code to instantiate an experiment and extract the path information once that is done. That sounds a bit like doing payu setup before the run is submitted, which I know was something you were toying with @marshallward.

I think your idea of a storages variable is necessary, and might even be required in case there were storage mount points that were required that could not be automatically determined.

BRAIN WAVE!

We could use the manifest to determine paths and mount points, so no need for payu setup in general. It would require a manifest to be present, which would mostly be the case. When not present a simple payu setup would generate them for those less common occasions.

aekiss commented 4 years ago

Hi @aidanheerdegen - just checking, does the closure of this issue mean I can remove this line from config.yaml? qsub_flags: -l storage=gdata/hh5+gdata/qv56+gdata/ik11+gdata/ua8 https://github.com/COSIMA/01deg_jra55_iaf/blob/9f431cba76a4665c6966ebd26cc842fc71bce0ef/config.yaml#L68

aidanheerdegen commented 4 years ago

Yes, but I think you'll need to add

storage:
    gdata:
           - hh5

As hh5 is not in any of the paths that payu searches for storage path information, and you'll need that mounted to sync data. If you change to sync to ik11 it shouldn't be necessary as that is in the input paths IIRC.

aekiss commented 4 years ago

we don't want any more storage to hh5 so I'll leave that out of the new configs.