pacificclimate / modelmeta

An ORM representation of the model metadata database
GNU General Public License v3.0
1 stars 0 forks source link

New(ish) indexer does not set Grid.srid #53

Closed rod-glover closed 6 years ago

rod-glover commented 6 years ago

When a new Grid is created, the sridis not set. This old code shows what is supposed to happen.

rod-glover commented 6 years ago

Also, we will have to fix the existing Grid records in the ce_metadatabase. This could be done by re-indexing all files (ugh), or by some less direct but possibly better scheme. That would depend on what the result of the srid-setting code would have generated. Might or might not be easy to determine.

rod-glover commented 6 years ago

The old code:

a. generates a Proj4 string based on standard CRS metadata in the NetCDF file b. converts that string to WKT c. uses the WKT to find or insert an appropriate record in the Postgis spatial_ref_sys table d. which gives the srid to use in Grid.

rod-glover commented 6 years ago

Task: Generate WKT for spatial_ref_sys.srtext

Context: The indexer needs to find or create a spatial_ref_sys record to represent the CRS defined in the file being indexed. Original R code generated a Proj4 string and used a GDAL-based utility to convert it to WKT.

(Note: There is at least one erroneous spatial_ref_sys record that was created by us in the pcic_meta database.)

Problem: WKT is verbose and relatively complicated.

Option A: The easiest way to get WKT is to generate a Proj4 string and convert it to WKT. However, there are few utilities for converting from Proj4 to WKT and each has its disadvantages. The known utilties available in Python are:

Option B: Reinvent the wheel; manually generate our own WKT.

Analysis:

What is the WKT used for?

rod-glover commented 6 years ago

Outstanding questions:

  1. Is WKT used by anything but the indexer? If so, by what and how?
  2. Can we use PyCRS (desirable because much smaller and simpler)? This devolves into the question of what we can legitimately use for values of the +ellps Proj4 parameter.
rod-glover commented 6 years ago

@jameshiebert , any thoughts? Sorry if the analysis above is a little long. This is a bit of a rabbit hole.

I'm leery of introducing GDAL into this package, but perhaps you differ and think that would be a good idea here. It would make the Python code more similar to the R code it replaces. It also dodges some questions, like "what about the ellipsoid parameter?". That question in particular I find irritating, and I'm not sure that the original R code got exercised enough to even discover whether GDAL throws an exception when +ellps is missing. I will do some experiments with it, but that means installing GDAL. Ick.

rod-glover commented 6 years ago

We can test question 1 by altering (in a test database) the WKT column for one of the srids that our applications use. We should set it to something erroneous. If ncWMS breaks, then we know it is using it. If not, then we are in the clear, so long as ncWMS is the only potential client of this information.

corviday commented 6 years ago

This warning message thrown by the database-enabled ncWMS client seems like pretty good evidence that ncWMS-PCIC is using the WKT:

Dec 06, 2017 9:42:18 AM uk.ac.rdg.resc.ncwms.config.DatabaseCollection getProjectionImpl
WARNING: Couldn't parse WKT string into a Projection.
corviday commented 6 years ago

I talked to Matthew about whether it was possible to somehow see exactly what queries ncWMS-PCIC is running on pcic-meta as a way to tell what information ncWMS-PCIC is using. He said it was possible to turn on logging for every query, but that 1) we wouldn't want it on long, since it rapidly fills up the disk, and 2) changing logging settings might require restarting the database, which is nonideal for a production database.

He's going to look into the details and see whether it's a reasonable way to get an answer to whether ncWMS-PCIC is using the WKT string or if it would be too much trouble.

EDIT: Matthew reports that logging all queries can be turned on without restarting the database, but not for a single table, so the net result might be a slowing of the database for everyone while logging is turned on. If we decide it's worth it, we'd presumably want to turn on logging, force a ncWMS reload via the ncWMS admin console, then turn logging off again.

rod-glover commented 6 years ago

Update on research on PROJ.4 strings:

  1. Most helpful but out of date proj reference doc.
  2. The Earth's figure must be specified. This is done with either the parameters +a (required), and optionally one of +b, +f, +rf, +e, +es. Alternatively, the +ellps parameter provides a named alias for specific sets of those parameters (e.g., WGS84).
  3. Parameter defaults can be specified in a local file. However, the +no_defs parameter says not to use defaults, and we (rightly) use +no_defs in all proj strings.
  4. Therefore we must specify either +a or +ellps in every proj string we generate.
  5. The original R code doesn't specify either +a or +ellps in the following projections: polar stereographic, Lambert conformal conic, transverse Mercator. (This is not entirely unreasonable, since the CF Convention spec does not specify Earth figure parameters for these projections. Puzzling.)
  6. Absence of Earth figure parameters will cause picky CRS converters (e.g., PyCRS) to object.
  7. It looks like the GDAL converter is not picky, i.e, it must supply a default Earth figure when none is specified.

Since the CF Convention spec does not specify Earth figure parameters for some projections, we can't reasonably expect them in the metadata. So we will have to provide a default. I propose defaulting +ellps in those cases. I propose making the default a parameter of the proj4_string method, with value 'WGS84'.

rod-glover commented 6 years ago

It's not worth turning on the logging. The PROJ.4 strings have been corrected, and should result in valid WKT strings.