Open rod-glover opened 7 years ago
A few brief responses:
I'm going to put the basic question up front, then some responses to your points.
Question: Given that the average cell area is not used to find Grid entries in the database and can't be used to compute area-weighted averages (see 1. below), shall we adopt the new computation? Or would you prefer to stick with the existing one?
Responses:
Context: Converting the R script
db/index_netcdf.r
to Python:mm_cataloguer/index_netcdf.py
. Determining the mean area of a grid cell in a grid defined by vectors of latitude and longitude values (to fill database fieldGrid.cell_avg_area_sq_km
).Current situation: The formula used in the R script to approximate the area of a grid cell uses a conic approximation for the area of a patch of a sphere bounded by two lines of latitude and two lines of longitude. Its formula is:
where
lat1
,lat2
,lon1
,lon2
are the bounding values in radians, andR
is the radius of the earth in km.To compute an approximation to the mean grid cell area, the following things are assumed:
lon
) is equally spacedlat
) is stored in ascending order of value by indexSo to compute the mean of the approximation above over all cells:
where
abs
,diff
, andcos
apply pointwise to a list, andmean
takes the mean of a list (thinknumpy
).Note: The assumption that
lon
is equally spaced significantly simplifies the computation; it isO(N)
whereN
is the number of latitude values.Problem 1: The approximation is accurate only for small differences in latitude, and decreases in accuracy as the latitude approaches a pole. Some grids we encounter at PCIC are coarse (large latitude differences) and approach the poles. This formula will be inaccurate in those cases.
Problem 2: It is unnecessarily complex (in both the mathematical and the computational complexity senses) if you know the exact formula for the area of a patch of a sphere bounded by pairs of lat and lon lines.
Potential response A: The exact formula, valid for all values, for the area of a patch bounded by
lat1
,lat2
,lon1
,lon2
is:The mean area of a grid cell is just the sum of the areas of all grid cells divided by the number of grid cells. The sum of the area of all grid cells is just the area of the patch bounded by the minimum and maximum lat and lon values. Therefore the mean area of a grid cell is:
plus some carefulness about wrapping of longitude values.
This is much simpler in form and much less computationally costly than the mean of approximations computation above. Nor does it depend on assumption 2 above (equispaced longitudes).
Potential response B: What is the average cell area used for? Does it actually matter? Is it worth expending the computational effort?