parthenon-hpc-lab / parthenon

Parthenon AMR infrastructure
https://parthenon-hpc-lab.github.io/parthenon/
Other
105 stars 33 forks source link

Fix restaring from files not containing the derefinement counter (from #1073) #1089

Closed pgrete closed 1 month ago

pgrete commented 1 month ago

PR Summary

In https://github.com/parthenon-hpc-lab/parthenon/pull/1073 the per-block derefinement counter was stored in restart files (and restored on restarts). This changes the number of entries in the flattened loc.level-gid-lid-cnghost-gflag vector (from 5 per block to 6 per block). If one currently restarts with from an older file this is not taken into account and logical locations (and potentially more are off). I came across this because of a

### PARTHENON ERROR
  Message:     Somehow didn't find a tree.
  File:        /project/project_465000552/env/src/athenapk/external/parthenon/src/mesh/forest/forest.hpp
  Line number: 128
Somehow didn't find a tree.

on restart.

Effectively, this renders restart any older file impossible/erroneous right now. The proposed fix is not necessarily super-clean but it circumvents bumping the output format version (and introducing similar logic anyway). But if there's a different/cleaner solution, I'm also open to suggestions.

Note, that this still doesn't fully fix a restart problem I'm tracking down (also in context of reviewing #1055) but it fixes the error above. It might also be that the restart I'm trying to fix are broken because I used some Parthenon version in between that didn't contain all the mesh/tree related bugfixes that went in over the past weeks. I'm still investigating, but I'm fairly sure that this PR and the proposed fix are necessary no matter what.

PR Checklist

pgrete commented 1 month ago

This looks like a reasonable fix to me, sorry for not thinking of that before. A possibly cleaner choice, but that would require more work, would be to include the derefinement count in a separate dataset in the hdf5 file.

Given that the other PR is just 8 days in, I'm kind of in favor of the clean fix rather than carrying around a work-around (that might inspire similar extension in the future). I should be able to get to that early next week.

pgrete commented 1 month ago

(We could still merge this in now for an immediate resolution)

lroberts36 commented 1 month ago

(We could still merge this in now for an immediate resolution)

Let me see if I can get this to work this morning.

pgrete commented 1 month ago

My plan would be to extend the mesh_info object with a new vector containing the counters (which would by default set to 0 if not present).

lroberts36 commented 1 month ago

@pgrete and @Yurlungur: I just switched to making a new dataset and leaving level-gid-lid... as it was previous to #1073. I think this should be a clean fix. If the new dataset is not present in the file, HDF5 writes some complaints to screen but the exception is caught and it runs through.

lroberts36 commented 1 month ago

I will leave it to @pgrete to click automerge when he is happy with this.