xgcm / xgcm

python package for analyzing general circulation model output data
http://xgcm.readthedocs.org
MIT License
222 stars 80 forks source link

Computation of kinematic variables etc. using xgcm for C-grid atmospheric model output #209

Closed gewitterblitz closed 4 years ago

gewitterblitz commented 4 years ago

Hi,

I am running a 3D non-hydrostatic atmospheric model (called ARPS) which uses an Arakawa C-grid domain and has terrain following coordinates in the vertical dimension. Here's an example of the grid's x-axis representation:

image

More info about model grid representation can be found on page 2 onwards in this pdf (http://www.caps.ou.edu/ARPS/download/code/pub/ARPS.docs/ARPS4DOC.PDF/arpsch4.pdf).

The model produces an hdf4 format file output which I can read through pynio and some legacy code to retrieve 2D/3D variables, e.g.,

hdffile = Nio.open_file(file, format='hdf') There is some legacy code available for calculation of kinematic variables as well (such as vorticity and divergence). The legacy script uses the following grid shift convention to calculate values of vector quantities at scalar grid points:

std_gridshift = {
    's': (0, 0, 0),
    'u': (-1, 0, 0),
    'v': (0, -1, 0),
    'w': (0, 0, -1),
    'vortx': (0, -1, 1),
    'vorty': (-1, 0, 1),
    'vortz': (-1, 1, 0),
}

I tried comparing plots of u and v fields at scalar grid points calculated from legacy code versus xgcm. Here is my xgcm implementation after I created an xarray dataset from the data read through pynio:

# Create an xarray dataset 
# assuming C-grid to follow a left axis position for u,v, and w variables

arps_data = xr.Dataset(
                        data_vars={
                             'uvel':    (('x_left', 'y_center','z_center'), u),    
                             'vvel':    (('x_center', 'y_left','z_center'), v),
                             'wvel':    (('x_center', 'y_center','z_left'), w),
                             'pressure':    (('x_center', 'y_center','z_center'), p),
                             'pot_temp':(('x_center', 'y_center','z_center'), pt)
                        },
                        coords={
                            'x_left': x,
                            'y_left': y,
                            'z_left':z,
                            'x_center': xs,
                            'y_center':ys,
                            'z_center':np.arange(1,54)
                            }
                    )

arps_data.coords['zs'] = (('x_center', 'y_center','z_center'), zs)
arps_data

image

#Create an xgcm grid

from xgcm import Grid

grid = Grid(arps_data, coords={'X': {'center': 'x_center', 'left': 'x_left'},
                               'Y': {'center': 'y_center', 'left': 'y_left'},
                               'Z': {'center': 'z_center', 'left': 'z_left'}
                              },
            periodic=[]
           )

#calculate u at scalar points
us_xgcm = grid.interp(arps_data.uvel, axis='X',boundary='fill')

A small clarification needed here: All scalar and vector quantities from model output have the same array shape, i.e., (53,33,53) in the dimension order (x,y,z). I visualize this to be similar to the grid shown on ROMS wiki website here: https://www.myroms.org/wiki/Numerical_Solution_Technique

image

QUESTION: Now, in that case, u or v variables should ideally be defined along inner axes positions as per xgcm notation. But if I do that I get an array length mismatch error (e.g., expected array length 52 but got 53). Where am I going wrong with this?

Anyway, when I use left, it works fine and I get the avg field at scalar points. The output from xgcm and legacy code has zero difference everywhere except for one single boundary (probably because of the BC used). Here's an example:

image

Next, I tried computing vorticity (only z-component here):

# calculate vertical vorticity
zvorticity = (grid.diff(arps_data.vvel,axis='X',boundary='fill')/dx - grid.diff(arps_data.uvel,axis='Y',boundary='fill')/dy) 
#interpolate along X axis first and then along Y axis
zvort1 = grid.interp(zvorticity,axis='X',boundary='fill')
zvort2 = grid.interp(zvort1,axis='Y',boundary='fill')
#plot the vorticity at lowest z-level
final_zvort = zvort2[:,:,0].T.plot.pcolormesh(cmap='coolwarm',vmin=-0.0007,vmax=0.0007)

Here's the output:

image

The legacy code computes zvorts (z-vorticity at scalar point) as:

image

This is the output:

image

I believe the difference exists because of the BC used in legacy code.

My eventual aim is to solve a PDE which involves computation of differential terms multiplied together (e.g., dv/dy*du/dx). Therefore, I need to multiply these terms and then interpolate the result at scalar points (in this order to achieve better accuracy). I feel xgcm can do a good job at this. But before I proceed, I just wanted to discuss with the folks here if they think I am on right path and using xgcm the right way.

Any insights/comments on the code/computation would be highly appreciated!

rabernat commented 4 years ago

Hi @gewitterblitz. Thanks a lot for this interesting question!

Several of the xgcm developers (including myself) are on vacation at the moment. It may take us a few weeks to respond to your question. We appreciate your patience!

gewitterblitz commented 4 years ago

Thanks for the heads up, @rabernat.

Meanwhile, I realized that I can probably explain my issue better. Here are some more comments and questions:

1. From examples provided in xgcm documentation and elsewhere, it seems like other models (e..g, mitgcm etc.) also have equal shape/len for scalar(cenetered) and vector(staggered) variables. The explanation of XG and XC coordinates at this link was particularly helpful.

I guess I am confused because my model output has a slightly different representation of coordinates for grid centers compared to the schematic shown in my top comment. Here's the comparison of x and xs (since x[0] < xs[0], therefore, x should be the coordinates for cell faces)

image

If you look at the schematic of Arakawa-C grid used in ARPS model (figure attached in the top comment), you will notice that the length of scalar points is nx-1 while vector points have a length of nx. Moreover, the last coordinate (with the highest value) should belong to the vector grid points. However, x and xs share the same highest value (453000). Maybe this is a standard technique adopted for scientific computations and I am just not aware of it?

2. Question: While we are here, I also wanted some clarification regarding 'metrics' and conservation of scalar quantities. Many examples provided in the xgcm documentation using mitgcm or MOM6 output calculate vorticity etc. using finite volume approach. Since I am mostly using finite difference in ARPS, does the issue of conservation still occur there?

3. Question: Since ARPS follows a terrain following z-coordinates with stretching within the lowest few kilometers, the vertical dimension has 3D coordinates (i.e., non-constant values of zs at each level). Here is how it looks:

image

Have you dealt with this anywhere else?

4. Finally, I would say that xgcm is really promising for the kind of computation I want to perform for my research. I would certainly be interested in creating a small utility package for ingesting ARPS data into xarray and use xgcm for further computations. I noticed there are some customized adaptations for ocean models. Having one for a meteorological model would be a good contribution.

jbusecke commented 4 years ago

Dear @gewitterblitz, I am back from vacation and apologize for the long wait. First of all, thanks for using xgcm and for posting such detailed explanaitions.

I think xgcm is the right tool for the job and I would be very happy to see it being used for many more models, especially non-ocean models.

I think there are several issues to unpack here and ill try to go through them one-by-one:

I am not quite sure I understand the question correctly. In general, you can compute finite differences with xgcm (make sure to only specify single axes in your operations), but the way I have been using it is always to reproduce as closely as possible the way the model code executes an operation. As you correctly point out many ocean models use finite volume, but there are exceptions (e.g. the MOM5 ocean model uses finite difference approach to calculate pressure gradients).

We are very actively working on this in #205, and I will take this up as my next project (Ill open a separate issue). I would very much value input from you on this.

If the actual read in for the files is non-trivial we highly encourage this sort of package (see xmitgcm for inspiration). Just as a side note, we generally favor using base xgcm for calculations thereafter. Hopefully with input from many members of the science community we can make xgcm flexible/general enough to suit most needs and make model specific packages (besides from the read-in) unecessary.

rabernat commented 4 years ago
  • We currently do not support centered differences

This isn't totally accurate. To compute a centered difference, you would do grid.interp(grid.diff(...)).

jbusecke commented 4 years ago

I actually misread the legacy code. It seems like that this is just a backward difference and xgcm should be able to reproduce this exactly.

Could you remove the 2 rows at the boundary for each of the results and do a difference @gewitterblitz?

gewitterblitz commented 4 years ago

@jbusecke Thank you so much for your response. Here's my take:

If you look at the schematic of Arakawa-C grid used in ARPS model (figure attached in the top comment), you will notice that the length of scalar points is nx-1 while vector points have a length of nx. Moreover, the last coordinate (with the highest value) should belong to the vector grid points. However, x and xs share the same highest value (453000). Maybe this is a standard technique adopted for scientific computations and I am just not aware of it?

Yes, the legacy code is doing backward difference but my confusion was about this:

I think you are correct that the main difference between the results is caused by the boundary condition. From a quick look at the legacy code it seems the values at the border are just 'overwritten' with points further away from the border. xgcm does follow a slightly different approach, when you supply boundary='extend', where the border values are repeated outwards. Have you tried to specify different boundary conditions? Either way I would assume that some of the boundary points have to be trimmed?

Agreed. I did try plotting with different boundary options and the only difference is due to the last two rows/ columns which are just for padding purposes and can be trimmed.

Many examples provided in the xgcm documentation using mitgcm or MOM6 output calculate vorticity etc. using finite volume approach. Since I am mostly using finite difference in ARPS, does the issue of conservation still occur there?

I think what I meant here is that the finite volume examples talk about conservation of derived variables like vorticity etc. and the FV formulae take care of the conservation properties (IIRC, mitgcm etc. use a curvilinear coordinate system). I was just curious if a simple finite difference formulation on a grid (rectangular,uniform in horizontal, sigma coordinates in vertical) would also ensure that.

We are very actively working on this in #205, and I will take this up as my next project (Ill open a separate issue). I would very much value input from you on this.

Thanks for the update. I guess this would be really helpful for calculation of horizontal components of vorticity vector.

Could you remove the 2 rows at the boundary for each of the results and do a difference @gewitterblitz?

You are right, removing the last two rows/columns would solve the issue.

If the actual read in for the files is non-trivial we highly encourage this sort of package (see xmitgcm for inspiration).

That's a great idea. My model output files have hdf4 format and having a utility for converting it into xarray data structure would definitely streamline the analysis.

Finally, do you think it would be a good idea to keep this issue open; just in case I have questions about any such calculations in future or is it a better idea to create a new issue since I can always refer to this issue in future questions?

jbusecke commented 4 years ago

I think what I meant here is that the finite volume examples talk about conservation of derived variables like vorticity etc. and the FV formulae take care of the conservation properties (IIRC, mitgcm etc. use a curvilinear coordinate system). I was just curious if a simple finite difference formulation on a grid (rectangular, uniform in horizontal, sigma coordinates in vertical) would also ensure that.

I am not really sure I know the answer to this question. It might be interesting to compute both versions and evaluate the relative difference for your specific case.

That's a great idea. My model output files have hdf4 format and having a utility for converting it into xarray data structure would definitely streamline the analysis.

Sounds great. In case you have a nice use case as an example using this + xgcm, it would be great if you would consider submitting it to pangeo gallery

Finally, do you think it would be a good idea to keep this issue open; just in case I have questions about any such calculations in future or is it a better idea to create a new issue since I can always refer to this issue in future questions?

You can always reopen or refer to this issue, even if its closed. So yes I would suggest to close this in this case.