terraref / computing-pipeline

Pipeline to Extract Plant Phenotypes from Reference Data
BSD 3-Clause "New" or "Revised" License
21 stars 13 forks source link

Standardize geotiff / image metadata to be consistent w/ netcdf CF approach #268

Open dlebauer opened 7 years ago

dlebauer commented 7 years ago

geoTIFF files should have useful metadata that is consistent with the CF approach used for met and hyperspectral data; should also comply w/ existing OGC standards

Completion Criteria

dlebauer commented 7 years ago

@yanliu-chn could you please work on defining the geotiff standard format?

max-zilla commented 7 years ago

The code used in terrautils will enforce a standard method for generating geotiffs: https://github.com/terraref/computing-pipeline/issues/308

Will need help from others to enforce CF standards however.

dlebauer commented 7 years ago

Who takes the lead on this and when can it be finished (please add a milestone for May or June or ...)

max-zilla commented 7 years ago

Based on other discussions I think it would make sense for @craig-willis to take the lead on this, but I will talk more about this/terrautils at the meeting today.

craig-willis commented 7 years ago

@dlebauer Is there anything specific you're looking for in terms of metadata? Looking at the EnvironmentLogger and hyperspectral nc files, aside from variables I see primarily sensor information.

dlebauer commented 7 years ago

See also related issue exists for the point cloud data. https://github.com/terraref/computing-pipeline/issues/257. My comment there was "Goal is for (raster, point cloud) files to differ where it is useful, but have similar interfaces where applicable."

Here are some examples:

craig-willis commented 7 years ago

Thanks, @dlebauer.

I've been looking at the CF conventions for time (and I believe you had feedback on the time_utc variable we're currently using). CF conventions define a "time coordinate" (http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#time-coordinate), but not a timestamp in the way we've defined. Is it sufficient to use the UTC timestamp with offset ISO-8601 subset? Is the field name "time_utc" problematic?

$ gdalinfo file.tif:
...
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
...
dlebauer commented 7 years ago

time_utc

I've always found the CF convention of time (units of <interval> since <reference date>) to be cumbersome, so I have no issue with using a timestamp. The only issue I see with time_utc is that it is more difficult for users to interpret than the local time, which can be represented in ISO-8601 format as YYYY-MM-DDTHH:MM-HH:MM like 2007-04-05T12:30-02:00. My understanding is that this is how we are storing data in the start_time and end_time field in geostreams.

My original vision was that using gdal_translate from .tiff to .nc or .nc to .tiff would generate files with similar structure. So if this were from the FLIR camera, there would be a field with information about the variable represented by the raster layer in the image - name = temperature, units = C, dimensions = lat,lon etc.

But for now, the key will be to have an OGC-compliant file with the required information in external metadata.

What I had in mind for standards compliance was something like what is described in Annex A ("Annex A lists the conformance tests which shall be exercised on any software artifact claiming to implement GMLCOV for GeoTIFF") of the OGC GeoTIFF standards document 12-100r1_OGC_GML_ApplicationSchema-Coverages-_GeoTIFF_Coverage_Encoding_Profile.pdf .

But we should also focus on what is useful / necessary to meet the end-user needs (which I think can be met with well structured file-associated metadata in Clowder and geostreams).


For reference, here is an overview of the information in a MODIS hdf5 dataset. Like the netcdf, it also contains information about each layer in the file, the bounding box, the processing provenance, quality control &c. https://ladsweb.modaps.eosdis.nasa.gov/api/v1/filespec/collection=6&product=MOD13Q1.

When I ask MODIS for geotiff data these fields do not appear to propagate into metadata that a program like ArcGIS can read (or exif for that matter) so I am not sure if it is dropped. e.g. GTiff.tar.gz from https://modis.ornl.gov/subsetdata/23Aug2017_17:04:58_019465197L35.958767L-84.287433S25L25_MOD13Q1/

(here is a datset that covers the field scanner https://modis.ornl.gov/subsetdata/23Aug2017_17:34:58_983339455L33.07558L-111.97489S9L9_MYD13Q1/)

craig-willis commented 7 years ago

@dlebauer Thanks for the details. A few comments/questions: