Closed aidanheerdegen closed 7 months ago
Yeah as you have explained, the existence of archives to help determine whether its an legacy experiment or to use an branch-UUID experiment name, or if a new UUID needs to be generated. If payu sweep --hard
was run in an experiment without an existing legacy archive, it would then create a new UUID and archive on subsequent calls to payu setup/run
.
Solutions:
payu sweep --hard
does not delete archive folder but deletes the contents. Con: This leaves behind empty directories, new UUIDs won't be generated on this local copy of the branch again.payu sweep
that deletes outputs but leaves behind archive folder and metadata.keep_uuid
logic (which leaves the UUID unchanged if it exists) to before checking for legacy archives. Then could use payu checkout --keep-uuid $BRANCH_NAME
post running payu sweep --hard
to re-setup the archive. The --keep-uuid
/-k
option could be added to payu setup
. Con: It's not that intuitive for users to use --keep-uuid
/-k
post running payu sweep --hard
.The issue to solution 1 is that new UUIDs won't automatically be generated on that branch even if its "starting" again. For example, say an experiment output was synced to remote directory, then payu sweep --hard
was run. I think subsequent calls to payu setup/run
on that branch should generate a new UUID, unless it was continuing the same experiment (starting from the latest restart). (Note to start from restarts in this case, can use payu checkout -k -r $RESTART_PATH $BRANCH_NAME
to set up the archive again).
Add a check whether the UUID in the metadata matches the UUID in the legacy archive's metadata. If the UUIDs don't match then, don't use the legacy archive. Con: This check only really works if legacy archive contains a metadata file with a UUID (i.e. if the legacy experiment has been run with new payu changes).
If the legacy
doesn't contain a metadata.yaml
file with a UUID then isn't that the same as UUIDs not matching?
I think subsequent calls to payu setup/run on that branch should generate a new UUID, unless it was continuing the same experiment (starting from the latest restart)
I agree. We can't tell the difference between an empty/missing archive
, and we can't know why it might be missing or empty. As you say it is perfectly possible that a whole experiment is sync'ed to a different filesystem and the files in /scratch
have been purged.
Almost the worst thing we can do is continue to use the UUID if there is a possibility that it belongs to an existing experiment. So unless expressly told otherwise, we should just generate a new UUID/experiment.
An case of this happening could be if an legacy experiment was run for the first time with payu run/setup
. So it'll find no UUID in metadata -> generate a new UUID -> finds a legacy archive -> uses legacy archive for experiment name.
Hmm though I guess the above case could be caught by checking whether a new UUID has been updated or not at that point..
if legacy archive exists and
(new UUID generated or (legacy_metadata exists and legacy_UUID exists and legacy_UUID == UUID)):
use legacy archive
Ok I only just realised the above logic would break if a legacy experiment has metadata file with a UUID but it has not been copied over to archive yet (so it hasn't been run with the latest payu changes). There could be a few cases like this as experiment_uuid
was a pre-existing metadata field prior to the latest payu changes. So I think can only really check for
not (not new UUID generated and legacy_metadata exists and legacy_UUID exists and legacy_UUID != UUID)
which is:
Like I mentioned earlier, this will only really catch the case when legacy has already been run with payu setup/run
with the latest payu changes. I think this is doing a good job of highlighting how error-prone using branches alongside legacy experiments can be..
I think this is doing a good job of highlighting how error-prone using branches alongside legacy experiments can be..
Doh!
So I think if you have a solid improvement that catches more cases, or is consistent with more cases, then we should implement it even if there are corner cases that will still trip it up.
Once it's implemented it would be good to have a "recommendations for use with legacy experiments" short explainer. That could go in the docs or the added to the payu tutorial on the forum.
If an existing experiment does not have an archive path but a legacy archive path exists the current logic will assume it is a legacy experiment.
I discovered this when I made a new experiment, incorrectly configured it when I ran, so did a
payu sweep --hard
to remove the unwanted outputs and start again.For the logic contained here:
https://github.com/payu-org/payu/blob/b6db50b8502ebc7cfbaa9bc12d8920517f332ac7/payu/metadata.py#L178-L196
So the branch at https://github.com/payu-org/payu/blob/b6db50b8502ebc7cfbaa9bc12d8920517f332ac7/payu/metadata.py#L181 is the one taken.
As soon as I created
archive_path
for my new experimentpayu
takes the branch above and it works.