mrghg / agage-archive

Code for producing AGAGE archival files
MIT License
3 stars 0 forks source link

Create private data set #35

Closed lukewestern closed 5 months ago

lukewestern commented 7 months ago

This now produces a public and private archive. Public is as before. Private archive is same format as public but called "agage-private-archive.zip" Private processes and stores all available data beyond the release date, unless the cutoff for that species/instrument is before general release date (else earlier date is used).

run.py will generate the 2 archives when run as main. This will close #17 as a workaround of having provisional data in the public release.

I have not done anything with dvc. As private data is, by design, unreleased and fluid, it doesn't make sense to version in my opinion. Happy to change this.

Will fall down after 2100. Best tell the grandkids.

qq23840 commented 7 months ago

All looks good to me but I'll let Matt have a look as well

mrghg commented 6 months ago

There's an annoying issue here where it fails the first time you run it now, because one of the archives doesn't necessarily exist.

E.g., if you run the "public" option, it creates an empty public zip file (which is the right thing to do), but then the private one may not be there as it's only created when you run the private option. When the Paths class is initialised, the checks fail because that private file isn't there.

Maybe we should just remove the checking from Paths.

I'll come back to this...

mrghg commented 5 months ago

Note to self: Instead of a "public" Boolean argument, how about we have an output_suffix string that defaults to "". Then, we could have any number of types of output (not just public/private), and could get around the above issue by only checking for one output path at a time (at the moment, it has to check all outputs and fails if they don't all exist).

So if you have output_suffix = "private", in config.yaml, it would look for:

output_path_private = blah.zip

mrghg commented 5 months ago

@lukewestern see commit messages for changes and bug-fixes. Could you do a quick review?

lukewestern commented 5 months ago

OK, all looks good to me. I just added some underscores to the file names in the public files so it says NOT_FOR_PUBLIC_RELEASE.nc (rather than NOT FOR.... with spaces). No underscores are in the attribute.