nsidc / granule-metgen

Metadata generator for direct-to-Cumulus era
Other
0 stars 0 forks source link

Provide option to overwrite (clobber) existing UMM-G files #66

Open juliacollins opened 2 months ago

juliacollins commented 2 months ago

When running the process command, existing UMM-G files are returned rather than re-created. Add a command-line option to clobber any existing files.

Note that in some cases, users may wish to pick up processing where it left off (if things bailed out due to an error). In those situations, the current behavior makes sense.

Acceptance criteria: Given you run MetGenC from command line then previously created UMM-G files will be preserved if no flag for overwrite is set.

Given you run MetGenC from command line with flag then existing UMM-G files will be overwritten

Decision from Ops needed which one is the default option

juliacollins commented 4 weeks ago

@afitzgerrell These changes are merged into the main branch and will be available once you git pull the latest code. If you create a new ini file you should be queried for the option to overwrite UMM-G files. If the ini value is False, then you can override it on the command line with -o or --overwrite. The process help text should now show:

(nsidc-metgen-py3.12) ~/workspace/granule-metgen[1186]$ metgenc process --help
Usage: metgenc process [OPTIONS]

  Processes science data files based on configuration file contents.

Options:
  -c, --config TEXT   Path to configuration file  [required]
  -e, --env TEXT      environment  [default: uat]
  -n, --number count  Process at most 'count' granules.
  -wc, --write-cnm    Write CNM messages to files.
  -o, --overwrite     Overwrite existing UMM-G files.
  --help              Show this message and exit.

Let me know if things don't work as desired!

afitzgerrell commented 3 weeks ago
Found 2 granules to process
Processing all available granules
Removing existing files in output/ummg

I verified that indeed, two new ummg files had been written to output/ummg.

Found 2 granules to process
Processing all available granules

I checked and verified that the ummg files written during test 1 were not overwritten as they had the timestamp from when i ran test 1.

test 3: I ran metgenc process -c ./init/newDUCkTest.ini -o where the ini file contained overwrite_ummg = False, and metgenc output:

Found 2 granules to process
Processing all available granules
Removing existing files in output/ummg

I checked and verified that when specifying the -o flag, the existing ummg files were indeed overwritten as they had the timestamp marking the time I ran test 3.

test 4: Just to make super sure, I lastly ran metgenc process -c ./init/newDUCkTest.ini -o where the ini file contained overwrite_ummg = True, and metgenc output:

Found 2 granules to process
Processing all available granules
Removing existing files in output/ummg

I verified that as expected, double specifying to overwrite ummg files still led the ummg files to be overwritten.

I'll check with OPS on Monday whether running metgenc default (no flag) should be expected to overwrite or preserve existing ummg files or vice versa based on the overwrite_ummg = value set in the ini file.

In the meantime, I think this serves as a thorough test of the acceptance criteria, and after I learn whether OPS has a strong opinion or not, I'll report back for the ol' switcheroo or we can consider this issue ready for Lisa to close assuming she agrees.

afitzgerrell commented 2 weeks ago

The input I received from ops-folk was: Ops would expect to run metgenc with it defaulting to clobbering existing ummg files, but having a command line option that, when invoked, would preserve (skip) any existing ummg files in the output/ummg directory.

The MetGenC functionality and acceptance criteria should follow the opposite behavior:

Given you run MetGenC from command line then previously created UMM-G files will be clobbered if no flag to preserve is set.

Given you run MetGenC from command line with flag then existing UMM-G files will be preserved

juliacollins commented 2 weeks ago

I'm inclined to write up a new story to add a "no overwrite" command line option and to change the value of DEFAULT_OVERWRITE_UMMG (in the .ini file) to True. I think we should keep the existing overwrite flag available via the command line, though, since it doesn't impact the use case @afitzgerrell describes and preserves flexibility for users in general.

lisakaser commented 1 week ago

@juliacollins would this go hand in hand with a story where an data PI delivers science files and their own UMM-G files and Ops wants to use those rather than generate new UMM-G files?

juliacollins commented 1 week ago

@lisakaser maybe, but the use case that comes immediately to mind is the one where a dataset with many hundreds of files is processed and something causes an interruption. In that case operators may not want or need to start the processing from the very beginning.

juliacollins commented 1 week ago

@lisakaser I had it stuck in my head that the use case you describe would have some sort of run options to say "these are UMM-G from the PI!" but of course you are correct, we don't care how they were created, we just don't want to overwrite them.