Figure out how to make an ensemble to parallelize data generation

rabernat commented 3 years ago

Procedure:

spin up the model until it's equilibrated
save the state
create N ensemble memebers (this can be done in parallel). for each member
- add a random small perturbation to the state
- intialize a new model from that state
- run forward until the ensemble members decorrelate
- start the Lagrangian run

hscannell commented 3 years ago

Visualize the time series of kinetic energy using snapshots. Compute the cross-correlation between ensemble members to determine when they are no longer correlated.

hscannell commented 3 years ago

Up and running on Ginsburg this week!

I've created a small QG model and saved the EKE time series using run_with_snapshots. As the model gets spun up, the mean EKE increases until it reaches an equilibrated state. When the EKE plateaus the model is stable and one can easily see this in the second plot below.

To determine the start of the equilibrated state, I calculated the central difference of the upper layer EKE to better visualize the slope.

EKE_central_diff

A cutoff value of 1e-11 looks pretty descent. I found the time corresponding to when the EKE first dipped below this threshold and plotted it as a dashed vertical line on the time series below. This would be a good place to save the model state for the ensemble perturbation experiment. Here, I'm showing the change in EKE over time in both the upper and lower layer.

EKE

I also wanted to get a sense of how the PV anomaly field was evolving during this spin-up time. However, I had a difficult time saving the intermediate PV fields with run_with_snapshots. As you can see, the PV anomaly doesn't evolve (?). This can't be right and I'm sure it's a programming error. (Time is printed over each subplot)

PV_evolution

In my example code, I set up and initialize the model:

# Set parameters & initialize model
year = 24*60*60*360.
twrite = 1000
tavestart = 2*year
m = pyqg.QGModel(tmax=10*year, twrite=twrite, tavestart=tavestart)

# set upper and lower PV anomaly in spatial coordinates
sig = 1.e-6
qi = sig*np.vstack([np.random.randn(m.nx,m.ny)[np.newaxis,],
                  np.random.randn(m.nx,m.ny)[np.newaxis,]])
m.set_q(qi)

I then use run_with_snapshots to store the intermediate PV anomaly field, time, and mean EKE:

# Initialize lists for the PV anomaly and EKE
q = [qi]
eke = [[np.nan, np.nan]]
t = [0]

for snapshot in m.run_with_snapshots(tsnapstart=m.t, tsnapint=m.dt):
    q.append(m.q) # PV anomaly
    t.append(m.t) # model time
    try:
        eke.append(m.get_diagnostic('EKE')) # mean eddy kinetic energy 
    except:
        eke.append([np.nan, np.nan])

# Convert to numpy arrays        
q_snapshots = np.asarray(q)
EKE_snapshots = np.asarray(eke)
t_snapshots = np.asarray(t)

I'm not sure if appending the arrays is the best approach. Would love some feedback.

Currently, I'm scaling this up to the full Zhang2020 model set-up, which takes much longer to run because of the increased resolution, domain size, and layer thickness.

rabernat commented 3 years ago

Fantastic work Hillary! Thanks for leading the charge onto Ginsberg.

Here, I'm showing the change in EKE over time in both the upper and lower layer.

Great plot! I would consider using days or years on the x axis, since it's kind of hard to interpret 10^8 s! Also might want to explore a log scale on the y axis. This would bring out the lower layer more.

As you can see, the PV anomaly doesn't evolve (?). This can't be right and I'm sure it's a programming error.

Yeah that definitely doesn't look right. 🧐 The snapshots of q should not be the same every timestep! The model should be in a turbulent state. However, your KE timeseries seems to show variability, which suggests it is a plotting or saving problem. If I had to guess, I'd say the problem might be in your plotting code:

for i in enumerate(range(0,len(q_snapshots),5000)):
    print(i[1])
    plt.subplot(3,3,i[0]+1)
    plt.pcolormesh(q_snapshots[i[1],lev,:,:], cmap='RdBu')

Are we sure those indexes are right? Your code for appending the diagnostics looks right to me.

Currently, I'm scaling this up to the full Zhang2020 model set-up

I really like how your started with the cheaper, faster model to get the setup dialed. I would stick with that (don't scale up) until you get the plotting / output issues sorted out.

Looking forward to talking more tomorrow!

hscannell commented 3 years ago

Updated plots showing the spin up of the KE. The upper plot is the derivative of KE in the upper layer. The vertical dashed line show when the derivative first falls below 0.01e-10 cm² s^-3. This threshold was chosen by examining the plot and provides a good approximation for when this model becomes stable. In the second plot, I've added vertical blue lines to show when the snapshots of the PV anomaly are examined (third plot).

The evolution of the PV anomaly field with the year in the title of each subplot. The red titles indicate that the model is still spinning up, whereas the black titles are when the model is fully equilibrated (i.e., after 9.82 years).

hscannell commented 3 years ago

The model state at 9.82 years is saved and used to initialize 5 ensemble members, each with very small perturbations in the initial PV anomaly. These ensembles members are allowed to run for an additional 20 years in this notebook.

hscannell commented 3 years ago

One issue I had was that I couldn't get pyqg model.to_dataset() to work. Could it be that the conda install version doesn't include this feature yet??

hscannell commented 3 years ago

Also this plot of the ensemble members is not what I expected. Are the initial perturbations too large?

hscannell commented 3 years ago

The PV correlation between the 5 ensemble members as a function of time:

hscannell commented 3 years ago

We are using job-array and environmental variables to spawn ensemble members on Ginsburg.

For example:

sbatch --array=1-5 ensemble_generator.sh

where ensemble_generator.sh contains:


#!/bin/sh
#
# Generates ensembles of QG model run from equilibrated state
#
#SBATCH --account=abernathey     # group account name
#SBATCH --job-name=ensembles     # job name
#SBATCH -c 5                     # number of cpu cores to use (up to 32 cores per server)
#SBATCH --time=0-02:00           # time the job will take to run in D-HH:MM
#SBATCH --mem-per-cpu=20G         # The memory the job will use per cpu core

echo "$SLURM_ARRAY_TASK_ID"

module load anaconda
source ~/.bashrc
conda activate lcs-dev
PICKUP_FILE='/burg/abernathey/users/hillary/QG_equilibrium_proto.nc'

python ensemble_generator.py $SLURM_ARRAY_TASK_ID $PICKUP_FILE
# could provide arguments to script, (create another API)

hscannell commented 3 years ago

[x] save ensemble member snapshots at each time step to avoid memory errors.

hscannell commented 3 years ago

closed with #27

ocean-transport / lcs-ml

Figure out how to make an ensemble to parallelize data generation #15