ocean-eddy-cpt / gcm-filters

Diffusion-based Spatial Filtering of Gridded Data

https://gcm-filters.readthedocs.io/

Other

37 stars 24 forks source link

Add B-grid vector Laplacian #128

Closed paigem closed 2 years ago

paigem commented 2 years ago

This is the very start of a PR to add a vector Laplacian for B-grid models, e.g. POP and MOM5 (as discussed in #106).

I'll need some spin-up time to familiarize myself with the package code and the steps needed to compute the B-grid vector Laplacian, so this PR has no new content yet except for adding the skeleton of a new B-grid vector Laplacian function. I plan to work on this PR throughout the week, but wanted to get it started here so folks know that this is in the works.

Resolves #106.

paigem commented 2 years ago

@NoraLoose and I plan to pick this PR back up in February.

rabernat commented 2 years ago

Paige can you link to the POP code you showed me today in our meeting?

rabernat commented 2 years ago

A key question to answer is: which grid variables are needed by this routine?. We can determine that by looking at the POP code.

paigem commented 2 years ago

@rabernat The POP code for the horizontal mixing computation can be found at this link. I think that hdiffu_aniso() is the relevant function for this PR:

This routine computes the viscous terms in the momentum equation as the divergence of a stress tensor which is linearly related to the rate-of-strain tensor.

@NoraLoose I know we had planned to take a look at this later in Feb, but getting this implemented will help progress two different projects that I'm currently working on, so @rabernat and I are going to try and get this PR started this week. Would love any of your input if you have the time to contribute!

paigem commented 2 years ago

Useful links for this PR:

After my meeting with @rabernat earlier today, our plan is to start by translating the Fortran code from the hdiffu_aniso() function in hmix_aniso.F90. This was pretty slow going since there are a lot of constants and variable names that I am unfamiliar with in POP. But, I was able to make a start. See below.

@rabernat you mentioned that we should decide on some test data for this PR. I used the following velocity subsets from POP for what I've written so far:

import intake
cat = intake.open_catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/ocean/CESM_POP.yaml")
ds = cat['CESM_POP_hires_control'].to_dask()

# Make data subset: this creates 100x100 box in the North Atlantic Ocean
U = ds.U1_1.isel(time=0,nlon=slice(300,400),nlat=slice(1400,1500)) # surface zonal velocity
V = ds.V1_1.isel(time=0,nlon=slice(300,400),nlat=slice(1400,1500)) # surface meridional velocity

# Load the data for quicker computation
U.load()
V.load()

Start of my translation of hdiffu_aniso() from Fortran to Python (roughly covers lines 557-765 in hmix_aniso.F90):

# POP constants needed so far

c1 = 1
c2 = 2
p5 = 0.5
POP_haloWidth = 2 # number of ghost cells around each block
POP_blockSizeX = 4 # many different options depending on the model run (I think)
POP_blockSizeY = 4 
POP_nxBlock = POP_blockSizeX + 2*POP_haloWidth # size of block domain in i direction including ghost cells
POP_nyBlock = POP_blockSizeY + 2*POP_haloWidth
# block%ib = POP_haloWidth + 1 # beginning i index
# block%jb = POP_haloWidth + 1 # beginning j index
# block%ie = POP_nxBlock - POP_haloWidth # ending i index
# block%je = POP_nyBlock - POP_haloWidth # ending j index

# Define length of domain
nx = U.nlon.size #2400
ny = U.nlat.size #3600

# Use POP variable names for now 
UMIXK = U.values
VMIXK = V.values

# Compute gamma (depth ratios) for partial bottom cells
## Skipping this for now - assuming all cells are *not* partial bottom
GN = c1
GS = c1
GE = c1
GW = c1

# Compute rate-of-strain tensor in each quarter cell

# Initialize strain tensor E11, E22
E11 = np.zeros((nx,ny,4))
E22 = np.zeros((nx,ny,4))
E12 = np.zeros((nx,ny,4))

# Define grid spacing
H1W = U.HTN.values # x-spacing between U-points to the west
H2S = V.HTE.values # y-spacing to the south
H2N = np.roll(H2S,1,axis=1) # y-spacing to the north
H1E = np.roll(H1W,1,axis=0) # x-spacing to the east

# Compute metric terms ("K"): constructed using C-grid discretization of the metric factors (still trying to understand exactly what these metric terms are...)
WORKA = H2S + H2N
WORKB = np.roll(WORKA,-1,axis=0)
K1W = c2*(WORKA - WORKB)/(WORKA + WORKB)/H1W
K1E = np.roll(K1W,1,axis=0)

WORKA = H1W + H1E
WORKB = np.roll(WORKA,-1,axis=1)
K2S = c2*(WORKA - WORKB)/(WORKA + WORKB)/H2S 
K2N = np.roll(K2S,1,axis=1) 

# Loop to compute the rate-of-strain tensors
for j in np.arange(1,ny-1):
    for i in np.arange(1,nx-1):
        uw = GW*UMIXK[i-1,j]
        ue = GE*UMIXK[i+1,j]
        us = GS*UMIXK[i,j-1]
        un = GN*UMIXK[i,j+1]

        vw = GW*VMIXK[i-1,j]
        ve = GE*VMIXK[i+1,j]
        vs = GS*VMIXK[i,j-1]
        vn = GN*VMIXK[i,j+1]

        work1 = (UMIXK[i,j] - uw)/H1W[i,j]
        work2 = (ue - UMIXK[i,j])/H1E[i,j]
        work3 = p5*K2S[i,j]*(VMIXK[i,j] + vs) # get V at halfway point
        work4 = p5*K2N[i,j]*(VMIXK[i,j] + vn)

        E11[i,j,0] = work1 + work3
        E11[i,j,1] = work1 + work4
        E11[i,j,2] = work2 + work4
        E11[i,j,3] = work2 + work3

        work1 = (VMIXK[i,j] - vs)/H2S[i,j]
        work2 = (vn - VMIXK[i,j])/H2N[i,j]
        work3 = p5*K1W[i,j]*(UMIXK[i,j] + uw)
        work4 = p5*K1E[i,j]*(UMIXK[i,j] + ue)

        E22[i,j,0] = work1 + work3
        E22[i,j,1] = work2 + work3
        E22[i,j,2] = work2 + work4
        E22[i,j,3] = work1 + work4

        work1 = (UMIXK[i,j] - us)/H2S[i,j]
        work2 = (un - UMIXK[i,j])/H2N[i,j]
        work3 = (VMIXK[i,j] - vw)/H1W[i,j]
        work4 = (ve - VMIXK[i,j])/H1E[i,j]
        work5 = K2S[i,j]*(UMIXK[i,j] + us)
        work6 = K2N[i,j]*(UMIXK[i,j] + un)
        work7 = K1W[i,j]*(VMIXK[i,j] + vw)
        work8 = K1E[i,j]*(VMIXK[i,j] + ve)

        E12[i,j,0] = work1 + work3 - p5*(work5 + work7)
        E12[i,j,1] = work2 + work3 - p5*(work6 + work7)
        E12[i,j,2] = work2 + work4 - p5*(work6 + work8)
        E12[i,j,3] = work1 + work4 - p5*(work5 + work8)

There are still many lines of code in the function that will need to be translated - this is just a very first start!

Note that I skipped over the partial bottom cells for now. Note also that this code is currently written in loop form via numpy, and so will need to be spruced up to be added to GCM-filters. I just wanted to share what I've done so far.

rabernat commented 2 years ago

This is great progress Paige. I would encourage you to take this one step further. Can you take the code you posted above and enclose it in a function, so that it is clear:

what are the inputs (U and V but also grid variables)
what is the output (return something)

Once we have this, we can start turning it into a proper Laplacian.

The other, separate task is to turn the i,j loop into a vectorized operation using e.g. np.roll

paigem commented 2 years ago

Sounds great, thanks for the tips @rabernat! I'll keep working on it today.

I didn't make it a function yet since I only got through part of the hmix_aniso() function so far - we'll see how far I get today!

paigem commented 2 years ago

I have a few questions about the B-grid vector Laplacian scheme that we want to implement here. These may be easy answers for @rabernat or @NoraLoose or other experts here who have thought about these things more than I have:

(1) POP has 3 horizontal mixing schemes for horizontal momentum mixing:

Laplacian
Biharmonic
Anisotropic I have thus far assumed that we want the anisotropic one - is this correct?

(2) There are 3 different directions in which we can compute the anisotropic mixing:

grid-aligned
aligned due east
aligned along the flow direction I am guessing that for general vector-based filtering, we would want flow-aligned, but let me know if this is not the case.

(3) There are two options when computing the viscous terms in POP. These two options can be combined, so there are 4 possible cases:

With Smagorinsky nonlinear viscosity
With spatially varying mixing coefficients

I have been working on writing up the anisotropic code today into Python, but it is pretty long and complex (at least for me!), so I haven't yet consolidated it all into a function with the necessary grid variable inputs. Once I get some feedback on the above questions, I'll take another day or two next week to work on this and hopefully have a stand-alone function to share by then. 🙂

rabernat commented 2 years ago

We only need the simplest possible isotopic Laplacian for now.

NoraLoose commented 2 years ago

Hi Paige,

Thanks for working on this! All good questions - I will try to answer them below.

(1) POP has 3 horizontal mixing schemes for horizontal momentum mixing:

Laplacian

Biharmonic

Anisotropic I have thus far assumed that we want the anisotropic one - is this correct?

Laplacian vs. biharmonic: I would focus on the POP Laplacian. In GCM-Filters, we also use the biharmonic operator for all implemented Laplacians, but we simply do this by applying the Laplacian twice. (I assume that's how it is done in the POP Fortran code too, maybe modulo the boundary conditions). In short: No need to implement an extra biharmonic operator in GCM-Filters.
isotropic vs. anisotropic: I would focus on the isotropic Laplacian first. For some Laplacians we also have an anisotropic option but this could be future work. (Anisotropic means that the user can specify different filter scales in different directions, i.e., x- versus y-direction.)

(2) There are 3 different directions in which we can compute the anisotropic mixing:

grid-aligned

aligned due east

aligned along the flow direction I am guessing that for general vector-based filtering, we would want flow-aligned, but let me know if this is not the case.

These questions are relevant for further down the road, if we want to enhance the B-grid vector Laplacian with anisotropy options. For reference, I have only implemented the grid-aligned anisotropy for the C-grid vector Laplacian in GCM-Filters. The two other options are related to #41. Both of these options would require the user to input an additional vector field that describes the filter directions, but grid-aligned anisotropy can be done without this additional info. In that sense, "grid-aligned" is the easiest option (which is why I started with this option).

(3) There are two options when computing the viscous terms in POP. These two options can be combined, so there are 4 possible cases:

With Smagorinsky nonlinear viscosity

With spatially varying mixing coefficients

You can probably ignore lines like these in the POP code, which set the viscosity according to different viscosity schemes (e.g., Smagorinsky). For GCM-Filters, we only need the "base operator": the vector Laplacian, which also goes into any viscosity scheme. But the viscosity coefficient itself is not needed in GCM-Filters. Does this make sense?

paigem commented 2 years ago

Thanks @NoraLoose for your responses - that all makes sense! It looks like I was trying to make things much harder for myself by starting with the anisotropic version! So as you said, I think many of my questions will be relevant if and when we decide to implement the anisotropic Laplacian.

Here is a first pass of the function we would need to compute the B-grid Laplacian for vectors. This code is mostly just a translation of the POP code function hdiffu_del2() in the script hmix_del2.F90. I have glossed over several things, such as boundaries and a couple of the coefficients.

I plan to continue working at this tomorrow - first step will be to rewrite the remaining loop code using numpy.roll() as @rabernat suggested! I'm just sharing this here to keep others in the loop and to make sure I'm on the right track this time. Comments welcome!

Function:

def Bgrid_Laplacian(UMIXK,VMIXK,nx,ny,UAREA,TAREA,DXU,DYU,HUS,HTE,HUW,HTN):

    # Constants
    c2 = 2
    p5 = 0.5
    radius = 6370.0e5 # radius of Earth (cm)

    # Derived quantities
    UAREA_R = 1/UAREA
    TAREA_R = 1/TAREA

    DXUR = 1/DXU
    DYUR = 1/DYU

    # Calculate coefficients for the stencil without metric terms
    WORK1 = (HUS/HTE)*p5*(AMF + np.roll(AMF,-1,axis=1))

    DUS = WORK1*UAREA_R # South coefficient of 5-point stencil for the Del**2 operator acting at U points
    DUN = np.roll(WORK1,1,axis=1)*UAREA_R # North coefficient of 5-point stencil

    WORK1 = (HUW/HTN)*p5*(AMF + np.roll(AMF,-1,axis=0))

    DUW = WORK1*UAREA_R # West coefficient of 5-point stencil
    DUE = np.roll(WORK1,1,axis=0)*UAREA_R # East coefficient of 5-point stencil

    # Calculate coefficients for metric terms and for metric advection terms (KXU,KYU)
    KXU = (np.roll(HUW,1,axis=0) - HUW) * UAREA_R # defined in (3.24) of POP manual
    KYU = (np.roll(HUS,1,axis=1) - HUS) * UAREA_R

    WORK1 = (HTE - np.roll(HTE,-1,axis=0)) * TAREA_R  # KXT
    WORK2 = p5*(WORK1 + np.roll(WORK1,1,axis=1)) * p5 * (np.roll(AMF,-1,axis=0) + AMF)
    DXKX = (np.roll(WORK2,1,axis=0) - WORK2)*DXUR

    WORK2 = p5*(WORK1 + np.roll(WORK1,1,axis=0)) * p5 * (np.roll(AMF,-1,axis=1) + AMF)
    DYKX = (np.roll(WORK2,1,axis=1) - WORK2)*DYUR

    WORK1 = (HTN - np.roll(HTN,-1,axis=1)) * TAREA_R #KYT
    WORK2 = p5*(WORK1 + np.roll(WORK1,1,axis=0)) * p5 * (np.roll(AMF,-1,axis=1) + AMF)
    DYKY = (np.roll(WORK2,1,axis=1) - WORK2)*DYUR

    WORK2 = p5*(WORK1 + np.roll(WORK1,1,axis=1)) * p5 * (np.roll(AMF,-1,axis=0) + AMF)
    DXKY = (np.roll(WORK2,axis=0,shift=1) - WORK2)*DXUR

    DUM = -(DXKX + DYKY + c2*AMF*(KXU**2 + KYU**2)) # central coefficient for metric terms that do not mix U,V
    DMC = DXKY - DYKX # central coefficient of 5-point stencil for the metric terms that mix U,V

    # Calculate the central coefficient for metric mixing terms that mix U,V
    WORK1 = (np.roll(AMF,axis=1,shift= 1) - np.roll(AMF,axis=1,shift=-1))/(HTE + np.roll(HTE,axis=1,shift=1))
    DME =  (c2*AMF*KYU + WORK1)/(HTN + np.roll(HTN,axis=0,shift=1)) # East coefficient of 5-point stencil for the metric terms that mix U,V

    WORK1 = (np.roll(AMF,axis=0,shift= 1) - np.roll(AMF,axis=0,shift=-1))/(HTN + np.roll(HTN,axis=0,shift=1))
    DMN = -(c2*AMF*KXU + WORK1)/(HTE + np.roll(HTE,axis=1,shift=1)) # North coefficient of 5-point stencil

    DUC = -(DUN + DUS + DUE + DUW) # central coefficient of 5-point stencil
    DMW = -DME # West coefficient of 5-point stencil
    DMS = -DMN # East coefficient of 5-point stencil

    # Initialize the output arrays
    HDUK = np.zeros((nx,ny))
    HDVK = np.zeros((nx,ny))

    # Compute the horizontal diffusion of momentum
    for j in np.arange(ny-1): # ny-1 for now due to j+1 index below
        for i in np.arange(nx-1):

            # add metric contribution to central coefficient
            cc = DUC[i,j] + DUM[i,j]
            am = 1 # set to 1 for now for simplicity - will need to update!!

            HDUK[i,j] = am*((cc          *UMIXK[i  ,j  ] +  
                             DUN[i,j]*UMIXK[i  ,j+1] +  
                             DUS[i,j]*UMIXK[i  ,j-1] +  
                             DUE[i,j]*UMIXK[i+1,j  ] +  
                             DUW[i,j]*UMIXK[i-1,j  ])+  
                            (DMC[i,j]*VMIXK[i  ,j  ] +  
                             DMN[i,j]*VMIXK[i  ,j+1] +  
                             DMS[i,j]*VMIXK[i  ,j-1] +  
                             DME[i,j]*VMIXK[i+1,j  ] +  
                             DMW[i,j]*VMIXK[i-1,j  ]))

            HDVK[i,j] = am*((cc          *VMIXK[i  ,j  ] +  
                             DUN[i,j]*VMIXK[i  ,j+1] +  
                             DUS[i,j]*VMIXK[i  ,j-1] +  
                             DUE[i,j]*VMIXK[i+1,j  ] +  
                             DUW[i,j]*VMIXK[i-1,j  ])-  
                            (DMC[i,j]*UMIXK[i  ,j  ] +  
                             DMN[i,j]*UMIXK[i  ,j+1] +  
                             DMS[i,j]*UMIXK[i  ,j-1] +  
                             DME[i,j]*UMIXK[i+1,j  ] +  
                             DMW[i,j]*UMIXK[i-1,j  ]))

    return HDUK,HDVK

I used the following test data:

import intake
cat = intake.open_catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/ocean/CESM_POP.yaml")
ds = cat['CESM_POP_hires_control'].to_dask()

# Make data subset
U = ds.U1_1.isel(time=0,nlon=slice(300,400),nlat=slice(1400,1500))
V = ds.V1_1.isel(time=0,nlon=slice(300,400),nlat=slice(1400,1500))

# Load the data
U.load()
V.load()

And I called the above function with the below code:

import numpy as np
# Define input quantities

nx = U.nlon.size # number of longitude points
ny = U.nlat.size # number of latitude points

UAREA = U.UAREA.values
TAREA = U.TAREA.values

DXU = U.DXU.values
DYU = U.DYU.values

AMF = np.sqrt(U.UAREA/(c2*np.pi*radius/nx)**2).values # variable mixing factor for momentum mixing

HUS = U.HUS.values
HTE = U.HTE.values
HUW = U.HUW.values
HTN = U.HTN.values

# Use POP variable names for now 
UMIXK = U.values
VMIXK = V.values

Udiff,Vdiff = Bgrid_Laplacian(UMIXK,VMIXK,nx,ny,UAREA,TAREA,DXU,DYU,HUS,HTE,HUW,HTN)

After running the above code on the test data, I did a quick visualization of the original U velocity and the output U velocity diffusion:

Original U velocity

Output U velocity diffusion

paigem commented 2 years ago

Another question for the more knowledgeable here (@NoraLoose @rabernat):

POP defines the Laplacian horizontal friction term from the C-grid discretization of both the Laplacian and metric terms. My understanding is the metric terms have to do with computing the distances along the curvature of the Earth. My question: are we interested here in both the Laplacian and metric terms? This is probably a basic question, but I'm still trying to wrap my head around what is needed for vector filtering (and what metric terms are). The below code includes the metric terms.

Also, here is an update to the above code. @rabernat - could you take a look at this code or give me pointers for what to do next?

Changes made:

replaced the remaining looped code with numpy.roll()
added a couple more comments for the input arguments

Function:

def Bgrid_Laplacian(UMIXK,VMIXK,nx,ny,UAREA,TAREA,DXU,DYU,HUS,HTE,HUW,HTN,AMF):

    # Constants
    c2 = 2
    p5 = 0.5
    radius = 6370.0e5 # radius of Earth (cm)

    # Derived quantities
    UAREA_R = 1/UAREA
    TAREA_R = 1/TAREA

    DXUR = 1/DXU
    DYUR = 1/DYU

    # Calculate coefficients for the stencil without metric terms
    WORK1 = (HUS/HTE)*p5*(AMF + np.roll(AMF,-1,axis=1))

    DUS = WORK1*UAREA_R # South coefficient of 5-point stencil for the Del**2 operator acting at U points
    DUN = np.roll(WORK1,1,axis=1)*UAREA_R # North coefficient of 5-point stencil

    WORK1 = (HUW/HTN)*p5*(AMF + np.roll(AMF,-1,axis=0))

    DUW = WORK1*UAREA_R # West coefficient of 5-point stencil
    DUE = np.roll(WORK1,1,axis=0)*UAREA_R # East coefficient of 5-point stencil

    # Calculate coefficients for metric terms and for metric advection terms (KXU,KYU)
    KXU = (np.roll(HUW,1,axis=0) - HUW) * UAREA_R # defined in (3.24) of POP manual
    KYU = (np.roll(HUS,1,axis=1) - HUS) * UAREA_R

    WORK1 = (HTE - np.roll(HTE,-1,axis=0)) * TAREA_R  # KXT
    WORK2 = p5*(WORK1 + np.roll(WORK1,1,axis=1)) * p5 * (np.roll(AMF,-1,axis=0) + AMF)
    DXKX = (np.roll(WORK2,1,axis=0) - WORK2)*DXUR

    WORK2 = p5*(WORK1 + np.roll(WORK1,1,axis=0)) * p5 * (np.roll(AMF,-1,axis=1) + AMF)
    DYKX = (np.roll(WORK2,1,axis=1) - WORK2)*DYUR

    WORK1 = (HTN - np.roll(HTN,-1,axis=1)) * TAREA_R #KYT
    WORK2 = p5*(WORK1 + np.roll(WORK1,1,axis=0)) * p5 * (np.roll(AMF,-1,axis=1) + AMF)
    DYKY = (np.roll(WORK2,1,axis=1) - WORK2)*DYUR

    WORK2 = p5*(WORK1 + np.roll(WORK1,1,axis=1)) * p5 * (np.roll(AMF,-1,axis=0) + AMF)
    DXKY = (np.roll(WORK2,axis=0,shift=1) - WORK2)*DXUR

    DUM = -(DXKX + DYKY + c2*AMF*(KXU**2 + KYU**2)) # central coefficient for metric terms that do not mix U,V
    DMC = DXKY - DYKX # central coefficient of 5-point stencil for the metric terms that mix U,V

    # Calculate the central coefficient for metric mixing terms that mix U,V
    WORK1 = (np.roll(AMF,axis=1,shift= 1) - np.roll(AMF,axis=1,shift=-1))/(HTE + np.roll(HTE,axis=1,shift=1))
    DME =  (c2*AMF*KYU + WORK1)/(HTN + np.roll(HTN,axis=0,shift=1)) # East coefficient of 5-point stencil for the metric terms that mix U,V

    WORK1 = (np.roll(AMF,axis=0,shift= 1) - np.roll(AMF,axis=0,shift=-1))/(HTN + np.roll(HTN,axis=0,shift=1))
    DMN = -(c2*AMF*KXU + WORK1)/(HTE + np.roll(HTE,axis=1,shift=1)) # North coefficient of 5-point stencil

    DUC = -(DUN + DUS + DUE + DUW) # central coefficient of 5-point stencil
    DMW = -DME # West coefficient of 5-point stencil
    DMS = -DMN # East coefficient of 5-point stencil

    # Initialize the output arrays
    HDUK = np.zeros((nx,ny))
    HDVK = np.zeros((nx,ny))

    # Compute the horizontal diffusion of momentum
    am = 1
    cc = DUC + DUM

    HDUK = am * (cc*UMIXK + 
                 DUN*np.roll(UMIXK,-1,axis=0) +
                 DUS*np.roll(UMIXK,1,axis=0) +  
                 DUE*np.roll(UMIXK,-1,axis=1) +  
                 DUW*np.roll(UMIXK,1,axis=1) +
                 DMC*VMIXK +  
                 DMN*np.roll(VMIXK,-1,axis=0) +  
                 DMS*np.roll(VMIXK,1,axis=0)  +  
                 DME*np.roll(VMIXK,-1,axis=1) +  
                 DMW*np.roll(VMIXK,1,axis=1))

    HDVK = am * (cc*VMIXK + 
                 DUN*np.roll(VMIXK,-1,axis=0) +
                 DUS*np.roll(VMIXK,1,axis=0) +  
                 DUE*np.roll(VMIXK,-1,axis=1) +  
                 DUW*np.roll(VMIXK,1,axis=1) +
                 DMC*UMIXK +  
                 DMN*np.roll(UMIXK,-1,axis=0) +  
                 DMS*np.roll(UMIXK,1,axis=0)  +  
                 DME*np.roll(UMIXK,-1,axis=1) +  
                 DMW*np.roll(UMIXK,1,axis=1))

    return HDUK,HDVK

Test dataset:

import intake
cat = intake.open_catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/ocean/CESM_POP.yaml")
ds = cat['CESM_POP_hires_control'].to_dask()

# Make data subset: single time step (100x100) box in North Atlantic
U = ds.U1_1.isel(time=0,nlon=slice(300,400),nlat=slice(1400,1500))
V = ds.V1_1.isel(time=0,nlon=slice(300,400),nlat=slice(1400,1500))

# Load the data
U.load()
V.load()

Define inputs and call the function:

# Define input quantities
import numpy as np
nx = U.nlon.size # number of longitude points
ny = U.nlat.size # number of latitude points

UAREA = U.UAREA.values # area of U cell
TAREA = U.TAREA.values # area of T cell

DXU = U.DXU.values # x-spacing centered at U points
DYU = U.DYU.values # y-spacing centered at U points

AMF = np.sqrt(U.UAREA/(c2*np.pi*radius/nx)**2).values # variable mixing factor for momentum mixing

HUS = U.HUS.values # cell widths on south side of U cell
HTE = U.HTE.values # cell widths on east side of T cell
HUW = U.HUW.values # cell widths on west side of U cell
HTN = U.HTN.values # cell widths on north side of T cell

# Use POP variable names for now 
UMIXK = U.values
VMIXK = V.values

Udiff,Vdiff = Bgrid_Laplacian(UMIXK,VMIXK,nx,ny,UAREA,TAREA,DXU,DYU,HUS,HTE,HUW,HTN,AMF)

paigem commented 2 years ago

My attempt at writing the B-grid vector Laplacian function to fit into the kernels.py script. This is a VERY rough draft for now (haven't even checked that units are correct yet, with CESM's variables saved in cm 😬). I have added the below function to my local fork and it does not give the correct answer, but does run without errors. I have not added to my PR since it is likely that my code will need to be largely reworked.

@rabernat could you take a look at my code and give any feedback? I am not very familiar with writing data classes and using functions like __post_init__(self) or __call__(), so I'm sure that there are errors (or less-than-ideal syntax) in there.

I would also like input from @rabernat or @NoraLoose as to whether we need any updates to the filter.py script for the B-grid. At this stage, I have not updated any of that code.

B-grid dataclass code:

@dataclass
class BgridVectorLaplacian(BaseVectorLaplacian):
    """̵Vector Laplacian on B-Grid. To be implemented for viscosity operators on B-grids based on POP code.

    Attributes
    ----------

    DXU: x-spacing centered at U points
    DYU: y-spacing centered at U points
    AMF: variable mixing factor for momentum mixing
    HUS: cell widths on south side of U cell
    HUW: cell widths on west side of U cell
    HTE: cell widths on east side of T cell
    HTN: cell widths on north side of T cell
    UAREA: U-cell area
    TAREA: T-cell area
    """

    DXU: ArrayType
    DYU: ArrayType
    AMF: ArrayType
    HUS: ArrayType
    HUW: ArrayType
    HTE: ArrayType
    HTN: ArrayType
    UAREA: ArrayType
    TAREA: ArrayType

    def __post_init__(self):
        np = get_array_module(self.DXU)

        # Derived quantities
        self.UAREA_R = 1/self.UAREA
        self.TAREA_R = 1/self.TAREA

        self.DXUR = 1/self.DXU
        self.DYUR = 1/self.DYU

    def __call__(self, ufield: ArrayType, vfield: ArrayType):
        np = get_array_module(ufield)

        # Constants
        c2 = 2
        p5 = 0.5

        # Calculate coefficients for the stencil without metric terms
        self.WORK1 = (self.HUS/self.HTE)*p5*(self.AMF + np.roll(self.AMF,-1,axis=1))

        self.DUS = self.WORK1*self.UAREA_R # South coefficient of 5-point stencil for the Del**2 operator acting at U points
        self.DUN = np.roll(self.WORK1,1,axis=1)*self.UAREA_R # North coefficient of 5-point stencil

        self.WORK1 = (self.HUW/self.HTN)*p5*(self.AMF + np.roll(self.AMF,-1,axis=0))

        self.DUW = self.WORK1*self.UAREA_R # West coefficient of 5-point stencil
        self.DUE = np.roll(self.WORK1,1,axis=0)*self.UAREA_R # East coefficient of 5-point stencil

        # Calculate coefficients for metric terms and for metric advection terms (KXU,KYU)
        self.KXU = (np.roll(self.HUW,1,axis=0) - self.HUW) * self.UAREA_R # defined in (3.24) of POP manual
        self.KYU = (np.roll(self.HUS,1,axis=1) - self.HUS) * self.UAREA_R

        self.WORK1 = (self.HTE - np.roll(self.HTE,-1,axis=0)) * self.TAREA_R  # KXT
        self.WORK2 = p5*(self.WORK1 + np.roll(self.WORK1,1,axis=1)) * p5 * (np.roll(self.AMF,-1,axis=0) + self.AMF)
        self.DXKX = (np.roll(self.WORK2,1,axis=0) - self.WORK2)*self.DXUR

        self.WORK2 = p5*(self.WORK1 + np.roll(self.WORK1,1,axis=0)) * p5 * (np.roll(self.AMF,-1,axis=1) + self.AMF)
        self.DYKX = (np.roll(self.WORK2,1,axis=1) - self.WORK2)*self.DYUR

        self.WORK1 = (self.HTN - np.roll(self.HTN,-1,axis=1)) * self.TAREA_R #KYT
        self.WORK2 = p5*(self.WORK1 + np.roll(self.WORK1,1,axis=0)) * p5 * (np.roll(self.AMF,-1,axis=1) + self.AMF)
        self.DYKY = (np.roll(self.WORK2,1,axis=1) - self.WORK2)*self.DYUR

        self.WORK2 = p5*(self.WORK1 + np.roll(self.WORK1,1,axis=1)) * p5 * (np.roll(self.AMF,-1,axis=0) + self.AMF)
        self.DXKY = (np.roll(self.WORK2,axis=0,shift=1) - self.WORK2)*self.DXUR

        self.DUM = -(self.DXKX + self.DYKY + c2*self.AMF*(self.KXU**2 + self.KYU**2)) # central coefficient for metric terms that do not mix U,V
        self.DMC = self.DXKY - self.DYKX # central coefficient of 5-point stencil for the metric terms that mix U,V

        # Calculate the central coefficient for metric mixing terms that mix U,V
        self.WORK1 = (np.roll(self.AMF,axis=1,shift= 1) - np.roll(self.AMF,axis=1,shift=-1))/(self.HTE + np.roll(self.HTE,axis=1,shift=1))
        self.DME =  (c2*self.AMF*self.KYU + self.WORK1)/(self.HTN + np.roll(self.HTN,axis=0,shift=1)) # East coefficient of 5-point stencil for the metric terms that mix U,V

        self.WORK1 = (np.roll(self.AMF,axis=0,shift= 1) - np.roll(self.AMF,axis=0,shift=-1))/(self.HTN + np.roll(self.HTN,axis=0,shift=1))
        self.DMN = -(c2*self.AMF*self.KXU + self.WORK1)/(self.HTE + np.roll(self.HTE,axis=1,shift=1)) # North coefficient of 5-point stencil

        self.DUC = -(self.DUN + self.DUS + self.DUE + self.DUW) # central coefficient of 5-point stencil
        self.DMW = -self.DME # West coefficient of 5-point stencil
        self.DMS = -self.DMN # East coefficient of 5-point stencil

        # Compute the horizontal diffusion of momentum
        am = 1
        cc = self.DUC + self.DUM

        u_component = am * (cc*ufield + 
                     self.DUN*np.roll(ufield,-1,axis=0) +
                     self.DUS*np.roll(ufield,1,axis=0) +  
                     self.DUE*np.roll(ufield,-1,axis=1) +  
                     self.DUW*np.roll(ufield,1,axis=1) +
                     self.DMC*vfield +  
                     self.DMN*np.roll(vfield,-1,axis=0) +  
                     self.DMS*np.roll(vfield,1,axis=0)  +  
                     self.DME*np.roll(vfield,-1,axis=1) +  
                     self.DMW*np.roll(vfield,1,axis=1))

        v_component = am * (cc*vfield + 
                     self.DUN*np.roll(vfield,-1,axis=0) +
                     self.DUS*np.roll(vfield,1,axis=0) +  
                     self.DUE*np.roll(vfield,-1,axis=1) +  
                     self.DUW*np.roll(vfield,1,axis=1) +
                     self.DMC*ufield +  
                     self.DMN*np.roll(ufield,-1,axis=0) +  
                     self.DMS*np.roll(ufield,1,axis=0)  +  
                     self.DME*np.roll(ufield,-1,axis=1) +  
                     self.DMW*np.roll(ufield,1,axis=1))

        return (u_component, v_component)

ALL_KERNELS[GridType.VECTOR_B_GRID] = BgridVectorLaplacian

In case it's helpful, when applying the above Laplacian to the test data I mention in previous comments, I get the following:

Original U velocity:

Filtered U velocity: (appears to show the same velocity field, but with more NaNs)

rabernat commented 2 years ago

I've made some good progress today.

took @paigem's code and incorporated it into the kernels.py module
created a new Zenodo record for the small piece of test data (https://zenodo.org/record/5947728)
refactored the test suite to pull this data for the POP B-grid vector laplacian test fixtures (rather than trying to make up synthetic data)

Things are kind of working. The main challenges to sort out are:

how to handle boundary conditions
how to handle NaNs in both the input data and grid variables. (NaNs are present in the POP example data.)

The latter is responsible for the test failure in test_zero_area.

In contrast to what Paige posted above, the results also seem to pass the smell test, as demonstrated by the following example (should be runnable anywhere).

import gcm_filters
import xarray as xr
import pooch
from matplotlib import pyplot as plt

fname = pooch.retrieve(
    url="doi:10.5281/zenodo.5947728/pop_hires_test_data.nc",
    known_hash="md5:e88ee4d310f476abe5fa3aac72d2e51e",
)
ds = xr.open_dataset(fname)

grid_vars = ["DXU", "DYU", "HUS", "HUW", "HTE", "HTN", "UAREA", "TAREA"]
extra_kwargs = {name: ds[name].values for name in grid_vars}
u_data = ds.U1_1.values
v_data = ds.V1_1.values

lap = gcm_filters.kernels.BgridVectorLaplacian(**extra_kwargs)

u_lap, v_lap = lap(u_data, v_data)

fig, axs = plt.subplots(ncols=2, figsize=(12, 3))
pc0 = axs[0].pcolormesh(u_data)
axs[0].set_title('Original U')
plt.colorbar(pc0, ax=axs[0])
pc1 = axs[1].pcolormesh(u_lap, cmap='RdBu_r')
axs[1].set_title('Vector Laplacian U Component')
pc1.set_clim([-1e-12, 1e-12])
plt.colorbar(pc1, ax=axs[1])

To move forward, let's first decide what we want to do about NaNs. Then we will tackle the boundary conditions.

paigem commented 2 years ago

Thanks @rabernat! 🎉

It looks like NaNs were converted to zero in the C-grid Laplacian at the start of the function call:

ufield = np.nan_to_num(ufield)
vfield = np.nan_to_num(vfield)

rabernat commented 2 years ago

~...Yet there are still NaNs in the output. And their footprint is larger than the orignal u field, due the Laplacian diffusing them around.~ Never mind, I was misreading your comment. So we should probably add that to the b-grid one as well.

We need to distinguish between "land mask" and NaN. I think we generally do not want any NaNs in anything we are operating on within the Laplacian.

paigem commented 2 years ago

I added the following lines to the B-grid function, per the discussion above:

ufield = np.nan_to_num(ufield)
vfield = np.nan_to_num(vfield)

I have been able to successfully apply a fixed length scale filter to the B-grid. I ran into one issue when passing the grid variables as numpy arrays (using .values on the xarray data):

MissingDimensionsError: cannot set variable 'DXU' with 2-dimensional data without explicit dimension names. Pass a tuple of (dims, data) instead.

The error points to this line in filter.py:

~/gcm-filters/gcm-filters/gcm_filters/filter.py in __post_init__(self)
    399                 f"{list(self.Laplacian.required_grid_args())}"
    400             )
--> 401         self.grid_ds = xr.Dataset({name: da for name, da in self.grid_vars.items()})
    402 
    403     def plot_shape(self, ax=None):

There's a good chance I've done something wrong, but it seems like the code is not expecting a numpy array. So I passed the xarray DataArrays of each coordinate instead. This yielded an error when converting between units. Because the POP test data uses cm as the length scale, they need to be converted into meters (as in this example in the GCM-filter docs). When I convert all terms into meters, I get the following error:

MergeError: conflicting values for variable 'DYU' on objects to be combined. You can skip this check by specifying compat='override'.

The error points to line 401 in filter.py:

~/gcm-filters/gcm-filters/gcm_filters/filter.py in __post_init__(self)
    399                 f"{list(self.Laplacian.required_grid_args())}"
    400             )
--> 401         self.grid_ds = xr.Dataset({name: da for name, da in self.grid_vars.items()})
    402 
    403     def plot_shape(self, ax=None):

I believe this happens because the full list of coordinates are still attached to the variables. So even if I convert, say, the x-direction u-grid spacing DXU, its associated coordinates include other required coords, but they are still in cm. There is likely a much more elegant way to deal with this, but for now I just used Xarray's reset_coords(drop=True) to get rid of the extra coords.

The code I used to call the fixed length scale filter:

import gcm_filters
import xarray as xr
import pooch
from matplotlib import pyplot as plt

fname = pooch.retrieve(
    url="doi:10.5281/zenodo.5947728/pop_hires_test_data.nc",
    known_hash="md5:e88ee4d310f476abe5fa3aac72d2e51e",
)
ds = xr.open_dataset(fname)

# Convert grid vars to meters
DXU_m = ds.DXU.reset_coords(drop=True)/100
DYU_m = ds.DYU.reset_coords(drop=True)/100
HUS_m = ds.HUS.reset_coords(drop=True)/100
HUW_m = ds.HUW.reset_coords(drop=True)/100
HTE_m = ds.HTE.reset_coords(drop=True)/100
HTN_m = ds.HTN.reset_coords(drop=True)/100
UAREA_m = ds.UAREA.reset_coords(drop=True)/(100**2)
TAREA_m = ds.TAREA.reset_coords(drop=True)/(100**2)

grid_vars={
        'DXU': DXU_m, 'DYU': DYU_m, 'HUS': HUS_m, 
        'HUW': HUW_m, 'HTE': HTE_m, 'HTN': HTN_m, 
        'UAREA': UAREA_m, 'TAREA': TAREA_m
    }

# Remove scales smaller than 5000 km <-- need a large cutoff to see any noticeable difference, since this test data is so close to the equator
filter_scale = 5000000 # in m

# Find minimum grid length
dx_min = min(DXU_m.min(),DYU_m.min()).values 

filter_visc = gcm_filters.Filter(
    filter_scale=filter_scale,
    dx_min=dx_min,
    filter_shape=gcm_filters.FilterShape.GAUSSIAN,
    grid_type=gcm_filters.GridType.VECTOR_B_GRID,
    grid_vars={
        'DXU': DXU_m, 'DYU': DYU_m, 'HUS': HUS_m, 
        'HUW': HUW_m, 'HTE': HTE_m, 'HTN': HTN_m, 
        'UAREA': UAREA_m, 'TAREA': TAREA_m
    }
)
filter_visc

# Apply the filter and compute
(u_filtered, v_filtered) = filter_visc.apply_to_vector(u_data,v_data, dims=['nlat', 'nlon'])

Plot of original u-velocity (in m/s):

Plot of filtered u-velocity using 5000km fixed length scale filter (in m/s):

@rabernat @NoraLoose I have three questions:

Is the above error that I solved by using reset_coords() something that will need to be fixed in the code, or did I miss something obvious?
Apologies if I missed this in the docs, but does the package assume all inputs are in regular, SI units (e.g. using meters)? Or if I input my data in, say, cm and then apply a fixed length scale factor filter in cm, would I get the right answer?
Can we implement the fixed factor filter (e.g. filter to 10 times the grid scale of the input data) with the current B-grid Laplacian? Or do we need an anisotropic one to accomplish that?

I assume next up on the list is to account for boundaries - something I will need some assistance with on where to start.

rabernat commented 2 years ago

Is the above error that I solved by using reset_coords() something that will need to be fixed in the code, or did I miss something obvious?

This is the best option I can think of. It's really a problem with the data and with xarray's strict alignment of coordinates, not with gcm-filters.

Apologies if I missed this in the docs, but does the package assume all inputs are in regular, SI units (e.g. using meters)? Or if I input my data in, say, cm and then apply a fixed length scale factor filter in cm, would I get the right answer?

As long as the laplacian is self consistent in its internal handling of units (it expects cm and gets cm), I believe you will get the right answer. The output of the vector laplacian for $\nabla^2 u$ will be in units of s-1 cm-1. This is applied directly as a tendency here:

https://github.com/ocean-eddy-cpt/gcm-filters/blob/86a0da949a357c1d723152dc16a06373ff14e979/gcm_filters/filter.py#L193-L194

The tendency will by 100x larger than if it were measured in s-1 m-1. However, u itself is also 100x larger, since it is also measured in cm. So I think everything just cancels out. This is a hand-wavy hunch; Nora or Ian can probably formula a much more rigorous argument (or counter-argument).

Can we implement the fixed factor filter (e.g. filter to 10 times the grid scale of the input data) with the current B-grid Laplacian? Or do we need an anisotropic one to accomplish that?

I would not try to do that here. This is complicated enough already. It's not at all clear to me how to define a fixed-factor vector laplacian on a curvilinear grid.

I assume next up on the list is to account for boundaries - something I will need some assistance with on where to start.

There are two types of boundaries to account for:

The "global" array boundaries (what to do at the edges of the array). You are currently assuming periodic boundary conditions. This is incorrect at the northern boundary--we need to implement the tripolar exchange. But I would actually leave that aside for now. It's complicated and only affects the solution in the high northern latitudes. We may wish to defer implementing this feature until after OSM>
Internal boundaries, i.e. wherever there is a land point present. I think your current approach (nan_to_num) is to just ignore the land and treat it as ocean with zero velocity. This is actually a perfectly valid approach, adopted by many authors (see Grooms paper for details). The main downside is that it won't conserve global energy, but that is not a huge deal for our application. To move forward on this, I would spend some time looking at the POP code and manual and try to understand how viscosity boundary conditions (e.g. no slip) are applied. None of our current grid_vars ["DXU", "DYU", "HUS", "HUW", "HTE", "HTN", "UAREA", "TAREA"] contains information about the "wet mask" of the data, which is probably required to do this correctly. If it seems complicated we can just punt on this issue for now as well.

iangrooms commented 2 years ago

Apologies if I missed this in the docs, but does the package assume all inputs are in regular, SI units (e.g. using meters)? Or if I input my data in, say, cm and then apply a fixed length scale factor filter in cm, would I get the right answer?

This is a good point; we should add it to the docs somewhere. When you set up a filter object by calling gcm_filters.Filter, the units of filter_scale, dx_min, and any dx* or dy* variables in grid_vars need to be the same. Some filter types don't require any dx* or dy* grid vars, in which case only filter_scale and dx_min need to have the same units. When using a filter type that requires both dx* or dy* grid vars and area, the units of area need to be the units of dx* and dy* squared. Some filters require area but not dx* or dy*; for these filters the units of area don't actually matter, but we should probably recommend having units of area equal to units of filter_scale and dx_min squared, just to be safe.

All of that being said, it doesn't matter if the units are MKS or CGS or whatever: As long as the units are consistent (meaning everything is in centimeters or everything is in meters or everything is in kilometers, etc), you'll get the right answer.

Possibly this doesn't need to be stated, but just in case: The units of the quantity being filtered don't matter at all.

We could easily add this to the docs, but I wonder if there's a way to enforce it using xarray. Even if there is a way to enforce it, do we want to? Curious to hear what people think.

iangrooms commented 2 years ago

Can we implement the fixed factor filter (e.g. filter to 10 times the grid scale of the input data) with the current B-grid Laplacian? Or do we need an anisotropic one to accomplish that?

We can use the so-called "simple" fixed-factor filter with no further work. (Which is one reason why the simple one is so convenient!) But the other version of fixed-factor filtering requires an anisotropic and spatially-varying Laplacian.

rabernat commented 2 years ago

We can use the so-called "simple" fixed-factor filter with no further work.

My understanding is that it would still not be appropriate for a vector laplacian on a curvilinear grid, in which the basis vector orientation changes in space. This is not an issue for scalars.

NoraLoose commented 2 years ago

The "global" array boundaries (what to do at the edges of the array). You are currently assuming periodic boundary conditions. This is incorrect at the northern boundary--we need to implement the tripolar exchange. But I would actually leave that aside for now. It's complicated and only affects the solution in the high northern latitudes. We may wish to defer implementing this feature until after OSM

@paigem if you are interested, we could plan a little hackathon after OSM to get the tripole boundary condition working for both the C-grid and the B-grid vector Laplacians.

NoraLoose commented 2 years ago

Regarding the units of filter_scale, dx_min, dx*, dy*, and area*, we should definitely add to the documentation that the units have to be consistent among themselves.

As of now, we can't enforce unit consistency with xarray because filter_scale and dx_min are no xarray (but simply numpy) arrays.

iangrooms commented 2 years ago

My understanding is that it would still not be appropriate for a vector laplacian on a curvilinear grid, in which the basis vector orientation changes in space. This is not an issue for scalars.

Thanks for the comment @rabernat. I'm curious now about what would happen if we tried it. There's nothing wrong with applying the basic "simple fixed factor" algorithm with a vector laplacian, at least in the sense that nothing should break inside the algorithm. (Multiply u and v by their respective cell areas, then filter as if the grid size were uniform, then divide by the cell areas.) I don't know if it would act the way we want it to, or if it would give crazy results. But at this point we definitely shouldn't implement it as a vector filter type; my first comment was overconfident that it would behave as expected.

iangrooms commented 2 years ago

I can set up a PR to add a discussion of units to the documentation, unless someone objects.

paigem commented 2 years ago

I can set up a PR to add a discussion of units to the documentation, unless someone objects.

@iangrooms It would be great to have this added to the docs! Not sure where you plan to put it, but some mention of units in the example on "Filtering on a tripole grid" would be helpful, since that is the example that uses POP data in CGS units. At the moment, it looks like there is mention of units there, but the fields are all converted to meters and hence why I assumed it was necessary to have all inputs in meters.

paigem commented 2 years ago

I just noticed that there is a quantity in the B-grid Laplacian computation that depends on the radius of the Earth:

radius = 6370.0e5
ny, nx = self.TAREA.shape
self.AMF = np.sqrt(self.UAREA / (2 * np.pi * radius / nx) ** 2)

The constant radius is currently implemented in units of cm and so the code in this PR as it stands will only yield the correct result if the inputs are in cm.

To be honest, I don't quite understand what the quantity AMF is - is this a quantity that has any similarity in the C-grid Laplacian? Should this be the radius of the Earth even for regional datasets? Or should it instead be related to the distance of the input dataset (which would be the Earth radius for global, but smaller for regional datasets)? The magnitude I get when filtering the test data appears reasonable, so perhaps the Earth radius is the correct value.

If we actually do want the Earth radius, we will have to figure out how to account for the units of the input arrays.

Thanks again for all of your help and discussions here!

NoraLoose commented 2 years ago

I'm not familiar with the quantity AMF - the C-grid Laplacian does not use anything like it.

From the formula you are posting above, it looks like AMF is a dimension-less quantity that is

> 1 if the grid cell area UAREA is larger than a grid cell along the equator
< 1 if the grid cell area UAREA is smaller than a grid cell along the equator

assuming that the equatorial grid cells are isotropic in x- and y- direction (i.e., they are "square") and the grid is global. I would have to look a bit closer into the code to figure out how AMF is used (and if it is actually needed?).

But off the top of the head, I see 2 problems with AMF:

As you mention, the unit issue. Also, the Laplacian should not be particular to Earth. It should similarly work for fields on Mars or Saturn. To solve these two issues, radius has to be made an additional input parameter for the B-grid Laplacian (maybe with default equal to the radius of Earth).
The variable AMF really seems to be specific to global grids. If you put in a regional dataset which only uses half of the grid points in x-direction (i.e., nx is only half of the nx of the global dataset), the variable AMF will artificially shrink by a factor of 4. If AMF goes into the Laplacian that's not good, because the Laplacian is a local operator and should give us the same result if we input just a regional version of our global dataset. That being said, if we aim for implementing a tripole boundary condition for the B-grid Laplacian, this would assume a global input dataset too. The B-grid Laplacian should then be called something like VECTOR_B_GRID_GLOBAL to make the assumption of global input data extra clear. (We should use the same _GLOBAL suffix for all other grids that assume global input data, in particular the ones that use tripole boundary conditions.)

Should this be the radius of the Earth even for regional datasets? Or should it instead be related to the distance of the input dataset (which would be the Earth radius for global, but smaller for regional datasets)?

The radius of the Earth goes indirectly into your input grid spacings, like DXU, DYU etc (no matter if you use global or regional data). You can see that for example here: https://github.com/ocean-eddy-cpt/gcm-filters/blob/7644e04666a29918d978ee3b4fa39dc2a7361980/tests/conftest.py#L176-L178 where I'm re-creating the spacing of a spherical grid (for the C-grid Laplacian). But the user does not have to worry about how DXU and DYU depend on Earth's radius because DXU, DYU are usually part of the model output.

paigem commented 2 years ago

Thank you @NoraLoose for your previous comment, and apologies it has taken me so long to respond.

I've looked more into the AMF quantity, and it looks like it is used as a scaling factor for the spatial dependence of horizontal mixing. That is, it is used to make the mixing coefficients relative to the average grid spacing at the equator. I believe that the coefficients are defined for the average grid spacing at the equator (at the equator AMF=1), and AMF is then used as the scaling factor to apply those coefficients globally. The below plot of AMF shows that it is always less than or equal to 1:

But, based on an earlier comment that we only need the base Lacplacian, I think that we may not need the spatially varying mixing coefficients at all, in which case the AMF term is not necessary. In the POP code, there is actually a flag for the inclusion of spatially variable mixing arrays. When false, it sets AMF=1. So if I've understood this correctly, then I will go ahead and remove the dependence on AMF.

@NoraLoose and others: Would be great to get your input here to make sure I'm on the right track.

That being said, if we aim for implementing a tripole boundary condition for the B-grid Laplacian, this would assume a global input dataset too. The B-grid Laplacian should then be called something like VECTOR_B_GRID_GLOBAL to make the assumption of global input data extra clear.

Good point @NoraLoose! If we do plan to implement the tripole boundary condition, then we will need to require global input datasets anyway, regardless of the AMF quantity.

We should use the same _GLOBAL suffix for all other grids that assume global input data, in particular the ones that use tripole boundary conditions.

I think this is a good idea! This may belong in a new issue if others agree, but I'll mention a few thoughts here for now. So as @NoraLoose mentioned, we'd need to add the _GLOBAL to the two tripole boundary conditions to something like:

TRIPOLAR_POP_WITH_LAND_GLOBAL
TRIPOLAR_REGULAR_WITH_LAND_AREA_WEIGHTED_GLOBAL

Naive question about the other periodic boundary conditions: do the other Laplacians all assume the full domain extent (whether that's global, or an idealized domain) as well? For example, if I have an input dataset that is a subset of a global domain, then using periodic boundary conditions will not always be correct. I know you all have thought about this much longer than I have, so apologies if this is a trivial point.

paigem commented 2 years ago

I've removed the AMF term based on my reasoning in the previous comment. I have run the code on the test data, and the results look reasonable:

Original U (units cm/s)

Filtered U (to 100km using B-grid Vector Laplacian)

Note the different colorbar scale from the above plot

Laplacian from the `BgridVectorLaplacian()` function

I have also tested the code using a Dask Gateway cluster on Pangeo Cloud, and it does seem to run (though the Dask Dashboard suggested that it did not run as smoothly as we might like).

I'll be busy at the Ocean Sciences Meeting next week, but my next step will be to implement some tests for the B-grid Vector Laplacian.

NoraLoose commented 2 years ago

Great work @paigem!

Sorry I have missed answering your questions in your previous comment.

I agree with your decision to get rid off AMF. So far our approach has been the following: if the user wants a spatially varying filter scale, this has to be part of their input (either to the filter object or the Laplacian), see for example here:

https://github.com/ocean-eddy-cpt/gcm-filters/blob/25fadb00ea2a55f28496aa7ce0d78fb73153680a/gcm_filters/kernels.py#L232-L233

question about the other periodic boundary conditions: do the other Laplacians all assume the full domain extent (whether that's global, or an idealized domain) as well? For example, if I have an input dataset that is a subset of a global domain, then using periodic boundary conditions will not always be correct.

The 2 tables on this page of the documentation list the boundary conditions that are associated with the Laplacians coded so far. So yes, most Laplacians use a periodic boundary condition. Sometimes this makes sense, even if one doesn't deal with truly global data; for example if one wants to filter data from idealized model simulations such as NeverWorld2 (see here), which has a re-entrant channel. But in general, "periodic" will not be the correct boundary condition for a user's application. The user has to be aware of the boundary conditions used (hopefully through the table in the documentation linked above), and potentially open a PR with a new Laplacian that has the correct boundary condition for their application.

I hope this makes sense!

codecov-commenter commented 2 years ago

Codecov Report

Merging #128 (57fd8ad) into master (dca7035) will increase coverage by 0.37%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #128      +/-   ##
==========================================
+ Coverage   93.87%   94.24%   +0.37%     
==========================================
  Files          10       10              
  Lines         931      991      +60     
==========================================
+ Hits          874      934      +60     
  Misses         57       57

Flag	Coverage Δ
unittests	`94.24% <100.00%> (+0.37%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
gcm_filters/kernels.py	`88.30% <100.00%> (+2.19%)`	:arrow_up:
tests/conftest.py	`100.00% <100.00%> (ø)`
tests/test_kernels.py	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

paigem commented 2 years ago

Items to finish before merging:

[x] Add to docs
[x] Verify docstrings are in good shape
[ ] Make sure passes current tests
[ ] Test on GPU
[ ] Add new verification data for this Laplacian to pass tests

NoraLoose commented 2 years ago

To make sure the current tests pass, we need to modify conftest.py in several ways:

[ ] Add GridType.VECTOR_B_GRID to vector_grids
[ ] Generalize the fixture vector_grid_type_data_and_extra_kwargs so it produces test data not only for GridType.VECTOR_C_GRID but also for GridType.VECTOR_B_GRID

paigem commented 2 years ago

I made a couple of suggested changes, to keep consistency with the master branch. I think these changes got accidentally made during the merging process.

Thanks for catching those @NoraLoose! I pulled from master just before making the edits, so I'm not sure why that happened.

paigem commented 2 years ago

Add GridType.VECTOR_B_GRID to vector_grids

This is straightforward and done on my local copy!

Generalize the fixture vector_grid_type_data_and_extra_kwargs so it produces test data not only for GridType.VECTOR_C_GRID but also for GridType.VECTOR_B_GRID

This will likely be a bit more involved. I will make a start tomorrow morning on this and see how far I get before I ping you @NoraLoose for assistance!

paigem commented 2 years ago

@NoraLoose I think we are very close! Thank you for helping update the tests! As you mention in your PR to this branch, there are still a few tests that don't pass but this is to be expected.

Two questions:

I see above that the pre-commit test is not passing. Could this be solved by running pre-commit on this PR locally and pushing again?
In your comment in the PR to this branch, you mention that we will need to generate zarr test data after merging. Do we do that now, after merging your PR to this branch? Or do we need to wait and merge this PR first?

NoraLoose commented 2 years ago

I see above that the pre-commit test is not passing. Could this be solved by running pre-commit on this PR locally and pushing again?

:+1:

In your comment in the https://github.com/paigem/gcm-filters/pull/1#issue-1343393912, you mention that we will need to generate zarr test data after merging. Do we do that now, after merging your PR to this branch? Or do we need to wait and merge this PR first?

My PR is already merged, so now is the time to generate the zarr test data. Once the zarr test data is generated, all tests should pass and we can merge this PR.

NoraLoose commented 2 years ago

I think we are ready to merge this PR! :tada: Nice team work @paigem!

ocean-eddy-cpt / gcm-filters

Add B-grid vector Laplacian #128

Original U (units cm/s)

Filtered U (to 100km using B-grid Vector Laplacian)

Laplacian from the BgridVectorLaplacian() function

Codecov Report

Laplacian from the `BgridVectorLaplacian()` function