payu-org / payu

A workflow management tool for numerical models on the NCI computing systems
Apache License 2.0
21 stars 27 forks source link

Providing the experiment counter as an environment variable #443

Closed blimlim closed 5 months ago

blimlim commented 5 months ago

I've been investigating ways to run a UM file to netcdf converter as a payu userscript for the ESM1.5 release. To convert the files in the correct output directory (e.g. archive/output005), it would be helpful to access either the experiment's counter, or the current output directory held in expt.counter and expt.dir_path respectively. From what I can understand, neither of these are currently easily accessible by a userscript.

Would it be reasonable to export these as environment variables, perhaps after self.set_counters() and self.set_output_paths() in experiment.py? It would then be easy to provide these to the conversion job during its submission.

aidanheerdegen commented 5 months ago

I agree that sounds like a reasonable approach.

The subprocess should have a clone of the current payu environment, which should include PAYU_CURRENT_RUN

https://github.com/payu-org/payu/blob/81dfd77e63b75e5d35bc46135471e7c2d4fec4bb/payu/cli.py#L117

Is that accessible in a userscript?

As a work-around it should be possible to check what is in the archive and assume the most recent outputXXX directory is the one to be post-processed, depending on which userscript hook is used.

blimlim commented 5 months ago

Thanks Aidan,

I tried accessing PAYU_CURRENT_RUN in a user script with the following:

config.yaml:

userscripts:
   archive: print_run.sh

and print_run.sh:

#!/bin/bash
echo "Current run is: ${PAYU_CURRENT_RUN}" >> run_counting.txt

This produces the following output after two runs: run_counting.txt

Current run is: 
Current run is: 

From what I can tell, payu isn't actually exporting this to the environment anywhere (I might have missed something though!). It does write it out to the job.yaml file:

experiment.py:

# Dump job info
        with open(self.job_fname, 'w') as file:
            file.write(yaml.dump(info, default_flow_style=False))

However the archival step moves this into the output directory outputxyz, making it unavailable unless you already know the current run number!

Good idea for the work around. I'll try to get that running – I think the archive search to find the latest outputxyz directory will have to occur in a separate script before the conversion job is submitted to the queue, otherwise we'd run into issues if the conversion jobs got held up in the queue and ran out of order.

jo-basevi commented 5 months ago

I think with PAYU_CURRENT_RUN is only set when --initial or -i flag is passed to payu run. So using the above archive user-script, if I run payu run and then a subsequent payu run -i 1 - the run_counting.txt looks like:

Current run is:
Current run is: 1

So, we could set PAYU_CURRENT_RUN environment variable after Experiment.set_counter() is run. Just adding the line os.environ['PAYU_CURRENT_RUN'] = str(self.counter), and running payu run gives

$ cat run_counting.txt
Current run is:
Current run is: 1
Current run is: 2

For finding the archive directory, the archive symlink in the control directory is created before the archive user-script is run - would that be sufficient?

blimlim commented 5 months ago

Thanks Jo, I think that should work well. I'm wondering now whether it's safest to export the run counter or instead the actual path to the current output directory, just in case the output directory names/structure ever do change in payu. I think either should work for now, but let me know what you think would make the most sense.

jo-basevi commented 5 months ago

Yeah that is a good point. I doubt the output naming would change, e.g. outputxyz. With latest changes to archive experiment names using branch names and uuid names, getting the full path of the archive directory is a little more tricky (have to resolve the archive symlink to get the archive full path).

I could see having current output directory as an environment variable could be useful for other post-processing user-scripts. But following that reasoning, current restart directory might also be useful. If post-processing wanted to use archive as a whole or previous outputs/restarts, then the archive directory and run counter might also be useful. So then add PAYU_CURRENT_OUTPUT_DIR, PAYU_CURRENT_RESTART_DIR, PAYU_ARCHIVE_DIR and PAYU_CURRENT_RUN. Is that then too many environment variables?

Honestly I think either of the options would work fine.

blimlim commented 5 months ago

I think setting all the different environment variables would work well, and good point that it could be useful for other post processing scripts.