powellb / seapy

State Estimation and Analysis in Python
MIT License
28 stars 21 forks source link

Seapy superobs not considering ROMS cell-edges!? #55

Closed ocehugo closed 6 years ago

ocehugo commented 6 years ago

Hi,

It appears that Seapy is not considering the C-staggering locations:

seapy_mur_sst_mapping

A bigger view: seapy_mur_sst_mapping_rho_geo

Clearly, the code is superobing where it shouldn't. Looks like the cell edges are not taking into account. An offset is missing by the above behaviour.

raw data source: JPLMUR pseudo code:

 file=mur_file
 otype = seapy.roms.obsgen.ostia_sst_map(grid_file, grid_dt)
 obs = otype.convert_file(file)
 obs.to_netcdf(outfile)
powellb commented 6 years ago

Thanks for the feedback; unfortunately, I don't quite understand what you are showing. Allow me to explain what it is attempting to do: The gridder works (using the numpy-groupies 'aggregate' method) by putting the obs onto the model grid in terms of i and j locations. So, any observations that fall in the same i, j location (fractional or absolute) are combined. The i, j values are not the grid center, they are the lower, left.

Does this explain what you are seeing?

ocehugo commented 6 years ago

Hi @powellb,

sorry my fault. I should have been more descriptive.

The two plots above are in geographical coordinates, the "grid" in light black lines is from lon_psi,lat_psi, while cell centres (black stars) are lon/lat rho points. The error I'm describing is about those loose red superobs that should not exist because the raw obs are within a single cell.

The superobs functional should locate all obs that lie within the grid cell and do avg/std within that cell. That is my understanding. The above figure shows some single observations that are lying within a grid cell being average with outside members of other cells. This should not occur. Do you agree !?

The i, j values are not the grid center, they are the lower, left.

Yes, I agree. They should be lower left cell edge locations. However, my understanding of the code is that it's using rho positions to estimate i, j. I don't know all the internals, but:

  1. in obs.py, line 644-650 you are using lon_rho,lat_rho to generate the convex hull/match grid bounds.

This will limit superobs at the boundary. The actual cell extends "outside" of the model boundary points in rho space. This is equivalent of considering the boundary cells as half/shaved cells.

  1. in grid.py you use lm,ln starting at 0 to link geographical to numerical space (LM/LN are xi/eta rho limits in numerical space conventions as far as my memory go).

This creates a "defacto" psi geographical space where the superobs are being averaged. Isn't the cell corner positions estimates from centre positions (rho) something like i_rho(n)-0.5, j_rho(n)-0.5 !? This way, when rounding the fractional indexes, all data within 0 and 1 are within the first lower/left boundary point.

  1. The interpolation scheme uses hindices.f: a fortran code that assumes indexes start at 1, instead of 0 (python land) -- there is no adjustment on hindices output.

Maybe this is another source of the problem above. Without adjustment, there is an upper/left shift in geo space given a one-off increase in an index.

Anyway, this appears to be the case in the figure above: Some wrong (red) superobs are a clear average of four points -- the "rho" points are being used as "defacto" cell corners at the end of the processing stream.

ocehugo commented 6 years ago

@powellb : Quick follow up... I made some plots using WC13 as a testbed... results below can be generated with the gist: https://gist.github.com/ocehugo/62322bdd2668a4140ebd905438c6e18a

Just put wc13_grd.nc in a folder with the above gist and run pytest should create the images below.

I selected 8 total observations, with some lying in the exact same locations and the others in different cells. In one cell, 2 different locations have 2 equal observations at the exact same geographical coordinates. In the other cell, the 4 locations are independent in geospace.

Blue is raw obs, red is superobs, black stars are rho points and the edges are drawn according to psi or rho points (see below).

Basecase (raw obs locations == rho locations): sobs_fake_sfc_wc13_basecase

All good... but this is a false positive.

standardcase: (8 obs locations slightly shifted). sobs_fake_sfc_wc13_standardcase

Not good. Expected 2 superobs, got 5. Two points were considered away from the current cell (lower-left ones), and only two were averaged (top-right ones).

Cornercase ( 8 obs locations shifted to lie in different cells): sobs_fake_sfc_wc13_cornercase

Not good again. Expected 6 superobs (2 in one cell, 4 in the other cells). Got 5. Two points were now averaged, even though they lie in different cells! The superobs now is lying close a "psi" point.

beyondcase (obs locations shifted to left/down elements and edges of cells are plotted as rho points): sobs_fake_sfc_wc13_beyondcase_asrho

Clearly, the code is using rho points as cells edges. THis shifts the position of the superobs and generate wrong averaging/aggregations procedure.

This can also be explored in the line/grid cases below:

LineCase (a high-resolution line of observations in a longitude): sobs_fake_sfc_wc13_linecase

Again, not good. Superobs should have been correctly averaged to lie close to centre superobs location generated.

Gridcase (same but with points above and below): sobs_fake_sfc_wc13_gridcase

Same as my examples in the previous reports. Wrong superobing.

Considering the rho as the corners of the cells for the case above:

sobs_fake_sfc_wc13_gridcase_rhoascorners

Clearly, the code is doing the right thing, by considering the "rho" as the cell edges points. It's doing the wrong thing, however, given that those points are not the edges, but actual centres.

powellb commented 6 years ago

Hugo,

Thank you again for the plots and comments. I apologize for the delay as I've been traveling and trying to catch up.

I understand what you are showing now, and I see where the issue is coming up. I mispoke when I replied to you before that we want the lower, left of the cell to be the start of cell, rather it is the rho-point center that serves as the lower, left.

If you read the header of ROMS's extract_obs.F, it is based on the rho-grid centers. When specifying the observations, they are specified on the rho-grid with the first rho-value as i=0 and j=0. This is why the hindices and obs_gridder routines in seapy work this way (because they are set up for what the ROMS observations are expected to be). I paste the relevant part of the header here (NOTE that (0,0) is the first (lower, left) rho-grid position. So, the "super-obs" should go from the centers):

! All the observations are assumed to in fractional coordinates with ! ! respect to RHO-points: ! ! ! ! ! ! M r..u..r..u..r..u..r..u..r..u..r..u..r..u..r..u..r..u..r ! ! : : ! ! Mm+.5 v p++v++p++v++p++v++p++v++p++v++p++v++p++v++p++v++p v ! ! : + | | | | | | | + : ! ! Mm r u r u r u r u r u r u r u r u r u r ! ! : + | | | | | | | + : ! ! Mm-.5 v p--v--p--v--p--v--p--v--p--v--p--v--p--v--p--v--p v ! ! : + | | | | | | | + : ! ! r u r u r u r u r u r u r u r u r u r ! ! : + | | | | | | | + : ! ! v p--v--p--v--p--v--p--v--p--v--p--v--p--v--p--v--p v ! ! : + | | | | | | | + : ! ! r u r u r u r u r u r u r u r u r u r ! ! : + | | | | | | | + : ! ! 2.5 v p--v--p--v--p--v--p--v--p--v--p--v--p--v--p--v--p v ! ! : + | | | | | | | + : ! ! 2.0 r u r u r u r u r u r u r u r u r u r ! ! : + | | | | | | | + : ! ! 1.5 v p--v--p--v--p--v--p--v--p--v--p--v--p--v--p--v--p v ! ! : + | | | | | | | + : ! ! 1.0 r u r u r u r u r u r u r u r u r u r ! ! : + | | | | | | | + : ! ! 0.5 v p++v++p++v++p++v++p++v++p++v++p++v++p++v++p++v++p v ! ! : : ! ! 0.0 r..u..r..u..r..u..r..u..r..u..r..u..r..u..r..u..r..u..r ! ! ! ! 0.5 1.5 2.5 Lm-.5 Lm+.5 ! ! ! ! 0.0 1.0 2.0 Lm L ! ! ! !======================================================================= !

On Jul 8, 2018, at 7:44 PM, Hugo Oliveira notifications@github.com wrote:

@powellb : Quick follow up... I made some plots using WC13 as a testbed... results below can be generated with the gist: https://gist.github.com/ocehugo/91a32b4f14b0c45f8b04220e1497473f

I selected 8 total observations, with some lying in the exact same locations and the others in different cells. In one cell, 2 different locations have 2 equal observations at the exact same geographical coordinates. In the other cell, the 4 locations are independent in geospace.

Blue is raw obs, red is superobs, black stars are rho points and the edges are drawn according to psi or rho points (see below).

Basecase (raw obs locations == rho locations):

All good... but this is a false positive.

standardcase: (8 obs locations slightly shifted).

Not good. Expected 2 superobs, got 5. Two points were considered away from the current cell (lower-left ones), and only two were averaged (top-right ones).

Cornercase ( 8 obs locations shifted to lie in different cells):

Not good again. Expected 6 superobs (2 in one cell, 4 in the other cells). Got 5. Two points were now averaged, even though they lie in different cells! The superobs now is lying close a "psi" point.

beyondcase (obs locations shifted to left/down elements and edges of cells are plotted as rho points):

Clearly, the code is using rho points as cells edges. THis shifts the position of the superobs and generate wrong averaging/aggregations procedure.

This can also be explored in the line/grid cases below:

LineCase (a high-resolution line of observations in a longitude):

Again, not good. Superobs should have been correctly averaged to lie close to centre superobs location generated.

Gridcase (same but with points above and below):

Same as my examples in the previous reports. Wrong superobing.

Considering the rho as the corners of the cells for the case above:

Clearly, the code is doing the right thing, by considering the "rho" as the cell edges points. It's doing the wrong thing, however, given that those points are not the edges, but actual centres.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.