Closed rod-glover closed 6 years ago
Also, we will have to fix the existing Grid
records in the ce_meta
database. This could be done by re-indexing all files (ugh), or by some less direct but possibly better scheme. That would depend on what the result of the srid
-setting code would have generated. Might or might not be easy to determine.
The old code:
a. generates a Proj4 string based on standard CRS metadata in the NetCDF file b. converts that string to WKT c. uses the WKT to find or insert an appropriate record in the Postgis spatial_ref_sys table d. which gives the srid to use in Grid.
Task: Generate WKT for spatial_ref_sys.srtext
Context: The indexer needs to find or create a spatial_ref_sys record to represent the CRS defined in the file being indexed. Original R code generated a Proj4 string and used a GDAL-based utility to convert it to WKT.
(Note: There is at least one erroneous spatial_ref_sys record that was created by us in the pcic_meta database.)
Problem: WKT is verbose and relatively complicated.
Option A: The easiest way to get WKT is to generate a Proj4 string and convert it to WKT. However, there are few utilities for converting from Proj4 to WKT and each has its disadvantages. The known utilties available in Python are:
osgeo.osr
PyCRS
Option B: Reinvent the wheel; manually generate our own WKT.
Analysis:
What is the WKT used for?
It is used by the indexer as a unique identifier to find or create and re-use the same spatial_ref_sys record for same/equivalent CRSs that occur in different indexed files. In this case, a correct WKT is less important than a unique WKT that depends appropriately on the CRS parameters (as expressed in the Proj4 string).
It may more generally be used in much more complex ways by Postgis in order to transform coordinates between CRSs. In this case, correctness is very important. Such transforms do appear to be used by ncWMS, as evidenced by the problem with projections that was resolved by adding the correct srid to the offending grid records. However, whether the ncWMS transform depends on the the WKT is unknown. (It could only depend on the Proj4 string.)
Therefore correctness may be important too.
Outstanding questions:
@jameshiebert , any thoughts? Sorry if the analysis above is a little long. This is a bit of a rabbit hole.
I'm leery of introducing GDAL into this package, but perhaps you differ and think that would be a good idea here. It would make the Python code more similar to the R code it replaces. It also dodges some questions, like "what about the ellipsoid parameter?". That question in particular I find irritating, and I'm not sure that the original R code got exercised enough to even discover whether GDAL throws an exception when +ellps is missing. I will do some experiments with it, but that means installing GDAL. Ick.
We can test question 1 by altering (in a test database) the WKT column for one of the srid
s that our applications use. We should set it to something erroneous. If ncWMS breaks, then we know it is using it. If not, then we are in the clear, so long as ncWMS is the only potential client of this information.
This warning message thrown by the database-enabled ncWMS client seems like pretty good evidence that ncWMS-PCIC is using the WKT:
Dec 06, 2017 9:42:18 AM uk.ac.rdg.resc.ncwms.config.DatabaseCollection getProjectionImpl
WARNING: Couldn't parse WKT string into a Projection.
I talked to Matthew about whether it was possible to somehow see exactly what queries ncWMS-PCIC is running on pcic-meta as a way to tell what information ncWMS-PCIC is using. He said it was possible to turn on logging for every query, but that 1) we wouldn't want it on long, since it rapidly fills up the disk, and 2) changing logging settings might require restarting the database, which is nonideal for a production database.
He's going to look into the details and see whether it's a reasonable way to get an answer to whether ncWMS-PCIC is using the WKT string or if it would be too much trouble.
EDIT: Matthew reports that logging all queries can be turned on without restarting the database, but not for a single table, so the net result might be a slowing of the database for everyone while logging is turned on. If we decide it's worth it, we'd presumably want to turn on logging, force a ncWMS reload via the ncWMS admin console, then turn logging off again.
Update on research on PROJ.4 strings:
+a
(required), and optionally one of +b
, +f
, +rf,
+e
, +es
. Alternatively, the +ellps
parameter provides a named alias for specific sets of those parameters (e.g., WGS84).+no_defs
parameter says not to use defaults, and we (rightly) use +no_defs
in all proj strings. +a
or +ellps
in every proj string we generate. +a
or +ellps
in the following projections: polar stereographic, Lambert conformal conic, transverse Mercator. (This is not entirely unreasonable, since the CF Convention spec does not specify Earth figure parameters for these projections. Puzzling.) Since the CF Convention spec does not specify Earth figure parameters for some projections, we can't reasonably expect them in the metadata. So we will have to provide a default. I propose defaulting +ellps
in those cases. I propose making the default a parameter of the proj4_string
method, with value 'WGS84'
.
It's not worth turning on the logging. The PROJ.4 strings have been corrected, and should result in valid WKT strings.
When a new Grid is created, the
srid
is not set. This old code shows what is supposed to happen.