Proposed Recipes for MOM6 NeverWorld2 data

NoraLoose commented 2 years ago

Source Dataset

The NeverWorld2 dataset is output from idealized primitive equation MOM6 simulations, and is useful for studying ocean mesoscale turbulence over a hierarchy of grid resolutions. The dataset spans a hierarchy of resolutions: 1/4, 1/8, 1/16, 1/32 degree. In total, we have 8 experiments because the simulations were run with two different choices of hmix, which determines the depth of the idealized top boundary layer. The two choices for hmix are 5m and 20m.

The NeverWorld2 dataset is described in detail in Marques et al. (2022), in review. The model has intermediate complexity, incorporating basin-scale geometry with idealized Atlantic and Southern oceans, and with non-uniform ocean depth to allow for mesoscale eddy interactions with topography. The model is perfectly adiabatic and spans the equator, and thus fills a gap between quasi-geostrophic models, which cannot span two hemispheres, and idealized general circulation models, which generally have diabatic processes and buoyancy forcing.

As of now, the data are stored on the NCAR machines (casper) at /glade/campaign/univ/unyu0004/NeverWorld2/, and can only accessed by users with an account.
File format: netCDF
Organization of the source files: Each of the 8 experiments has the following files:

averages_*.nc (holds 5-day averages); one file per 500 days for the resolutions 1/4, 1/8, 1/16; one file per 100 days for the resolution 1/32
snapshots_*.nc (holds snapshots at 5-day frequency); one file per 500 days for the resolutions 1/4, 1/8, 1/16; one file per 100 days for the resolution 1/32
longmean_*.nc (holds 100-day averages, but over a longer time period than averages_*.nc and snapshots_*.nc); one file per 2000 days for the resolutions 1/4, 1/8; one file per 1000 days for the resolution 1/16; one file per 200 days for the resolution 1/32.
static.nc (holds the grid information); 1 file
ocean.stats.nc (holds time series of domain-integrated metrics like APE, KE over full spin-up); 1 file
one restart file (so users can extend the runs); 1 file

Special steps required to access the data: user account on casper + password

Transformation / Alignment / Merging

For 1. - 3. described above, the files should be concatenated along the time dimension.

Output Dataset

Zarr

Please edit and/or comment @gustavo-marques, @rabernat. The discussion started over here.

NoraLoose commented 2 years ago

I wonder if it would be better to store 5. and 6. (ocean.stats.nc and restart files) within the NeverWorld2 github repo, where we provide input files for interested users. These files are pretty small.

@gustavo-marques?

gustavo-marques commented 2 years ago

The restart files can be large. For the 1/32 deg, we have 3 files for each restart time which together are > 10 GB. I thought that storing large ncfiles on Github was not ideal, but perhaps that has changed?

gustavo-marques commented 2 years ago

one restart file (so users can extend the runs); 1 file

Restart files (so users can extend the runs): one file for the 1/4, 1/8, and 1/16 deg. configurations; 3 files for the 1/32 deg configurations.

NoraLoose commented 2 years ago

The reason I suggested to store the restart files elsewhere is that the goal of pangeo-forge is to provide "analysis-ready datasets". No-one will analyze the restart files. 😄 If they are ~10GB, we could think about more "traditional" storage options such as figshare?

gustavo-marques commented 2 years ago

Traditional storage options sound good.