Requested changes to hyperspectral metadata

dlebauer commented 7 years ago

The goal is to have an easy-to-interpret metadata that is as consistent as possible across file types (e.g. can be stored and imported / exported among geoTIFF, las, json, and other data types (exif?).

I am open to feedback - I don't necessarily understand all of the rationale behind the previous decisions.

Some questions

Instead of pushing all of the Lemnatec metadata to the nc file metadata, could you just link back to the record in Clowder that contains the data?
Does it make sense to have the standard_name field as the variable name? Or, is it possible to do nco and other netcdf operations by standard_name (or how is standard_name used to simplify interoperability in practice)?
Can you add lat / lon as a dimension or variable? (e.g. each pixel has a lat/lon as well as x/y)?
Have metadata list key dimensions and variables first, for ease of use, e.g.
- how is wvl_clb different than wavelength?

netcdf f8330b00-1b5a-4f22-840a-c8c8d0d561c5_ {
dimensions:
    radiation_wavelength = 955 ;
    x= 1600 ;
    y= 5191 ;
    time = 5191 ; #is time of each scan line needed?

variables:

    double time(time) ;
        frametime:units = "days since 1970-01-01 00:00:00" ;
        frametime:calender = "gregorian" ;
        frametime:notes = "Each time of the scanline of the y taken" ;
    double radiation_wavelength(wavelength) ;
        radiation_wavelength:long_name = "Hyperspectral Wavelength" ;
        radiation_wavelength:units = "meter" ;
        radiation_wavelength:standard_name = "radiation_wavelength" ;
        double longitude(x) ;
                longitude:units = "degrees_east" ;
                longitude:long_name = "longitude" ;
        double latitude(y) ;
                latitude:units = "degrees_north" ;
                latitude:long_name = "latitude" ;
    double x(x) ;
        x:units = "meter" ;
        x:reference_point = "Southeast corner of field" ;
        x:long_name = "North distance from southeast corner of field" ;
    double y(y) ;
        y:units = "meter" ;
        y:reference_point = "Southeast corner of field" ;
        y:long_name = "West distance from southeast corner of field" ;
    float surface_bidirectional_reflectance(wavelength, x, y) ; 
        surface_bidirectional_reflectance:long_name = "Reflectance of image" ;
        surface_bidirectional_reflectance:standard_name = "surface_bidirectional_reflectance" ;
        surface_bidirectional_reflectance:units = "1" ;

        ... any additional metadata below this, preferably in order of importance
       // global attributes:
                :geometry = "geojson string" ;

Add the bounding box in WKT or geojson

## geojson    http://geojson.org/geojson-spec.html
"features": [
      { "type": "Feature",
         "geometry": {
           "type": "Polygon",
           "coordinates": [
             [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0],
               [100.0, 1.0], [100.0, 0.0] ]
             ]
          },
          "properties": {
             "name": "bounding box",
             "prop1": {"this": "that"}
          }
        }
     ]

Completion Criteria

[x] discuss proposed changes & modify
[x] Identify changes in hyperspectral metadata
[ ] implement in pipeline
[ ] update existing metadata

ghost commented 7 years ago

What will be most used? lat, lon or x, y in meters? @remotesensinglab?

mdietze commented 7 years ago

I think it's fine to have local coordinates in x,y, but you definitely need to add variables for the lat/lon of the reference point for that x,y grid. That said, I'm not sure how things like ncview and panoply will load the image in that case -- probably worth making example files for the different options and try loading them, that might provide your best answer to whether go with x,y vs lat,lon vs both.

Also, using the southeast as the reference and having positive x numbers go west is unintuitive and counter to any other standard I've seen (which uses southwest as the reference and positive X go the East)

In addition the designation of the reference point as being the corner of the 'field' is ambiguous. Are you referencing so specific real-world field or the field of view (i.e. the corner of the image or the corner of the farm)? Referencing relative to the image corner makes the most sense to me, because not all hyperspectral data is for fields.

Finally, it's ambiguous whether the x,y coordinates are the center of a pixel or the corner of the pixel.

dlebauer commented 7 years ago

Hi Mike, Thanks for your feedback.

It does make sense to have latitude, longitude as dimensions for consistency, and to retain x,y as vectors. Henry does that make sense? For this 200x20m field and in the context of raster data products we can assume the grid mapping is square (but see also https://terraref.gitbooks.io/terraref-documentation/content/user/geospatial-information.html).

This is a special case where x,y are useful. The x heading west is indeed non-intuitive unless you are an engineer designing a field scanner. The pixels are 1mm x 1mm and the coordinates in this particular example are in x,y and are defined by the equipment manufacturer.

The current full metadata (that we plan to update in this issue) is here: https://gist.github.com/dlebauer/4ca36eeae00586bcde36f97579d6fcdf#file-hyperspectral_metadata-c there is a lot of extra metadata below, which is a good idea but I want to focus on the key dimensions and variables here.

mdietze commented 7 years ago

@dlebauer If you're trying to store info for one specific instrument that's fine, but the email you sent to the PEcAn team suggested you were trying to generate a standard for storing hyperspectral data, which to me implies a more general standard. For the general case, it would make sense to flip the data into an order that makes sense to end users and can be loaded using existing tools.

dlebauer commented 7 years ago

@mdietze

I think we agree, though that wasn't clear in the examples (I'm in transit, apologies!) My statement was

It does make sense to have latitude, longitude as dimensions for consistency, and to retain x,y as vectors.

So: latitude and longitude will be dimensions, as with the PEcAn met and model output. I'll update this above.

In addition to the required PEcAn structure, other variables can be added. In this case, we can add x and y as vectors with the same length as the lat and lon dimensions.

dlebauer commented 7 years ago

@hmb1 I am closing this b/c I think its implemented in both level 1 and indices products

dlebauer commented 7 years ago

@FlyingWithJerome, @hmb1 @czender it does not look like the level 1 hyperspectral metadata has been updated to this format. e.g. ncdump -h /data/terraref/sites/ua-mac/Level_1/hyperspectral/2017-05-01/2017-05-01_16-47-24-400/e9673701-4a6b-4b6d-b334-f5401dc98213.nc produces level_1_ncdump_json.txt

Is this still in progress?

czender commented 7 years ago

@dlebauer the current level 1 metadata contains x(x), latitude(x), y(y), longitude(y) etc. all the variables have long_name, units, and most have standard_name. what would you like to see changed? do you want fewer/no corner coordinations listed? a flat hierarchy? full-length names instead of short names, e.g., refectance_image instead of rfl_img? We will add the Geojson bounding box as suggested above but are not sure what other formatting changes to make.

dlebauer commented 7 years ago

@czender in general, I'd like to clean up the metadata so that the most important data (reflectance + dimensions) are easy to find. This could be done by both organizing the information and removing redundant / extraneous metadata.

So, if I do ncdump -h the reflectance + dimensions should be easy to find. Currently the first variables are xps_img and Google_Map_View, while reflectance (surface_albedo) is listed way at the end.

I didn't realize how you were using the groups to store the lematec metadata. But now that I see how groups can be used, would it make sense to have additional groups for the geospatial information (corners and reference points) and the calibration information (exposures, calibration data, etc)? I'll leave it to you to decide what is worth keeping around. I think if it is organized - even if it is just ordered correctly - it will not be as distracting.

Although there are issues with the Lemnatec metadata, lets wait until @max-zilla and @craig-willis are done with the metadata cleaner /standardizer tool before touching this.

ghost commented 7 years ago

@czender will move forward with removing all unimportant metadata info

dlebauer commented 7 years ago

@hmb1 and @czender what is the status on this?

hmb1 commented 7 years ago

wait for charlie on this one

czender commented 7 years ago

@FlyingWithJerome what is the status of this? We agreed that you would move all non-essential root group variables to a new and separate group in output.

ghost commented 7 years ago

@hmb1 to follow up

dlebauer commented 7 years ago

Can xps_img can be recovered from rfl_img and some other field(s) (like rfl_rfr_fct)? If so, can this variable also be dropped from the Level 1 data product to cut the file size in 1/2?

hmb1 commented 7 years ago

@dlebauer that makes sense. will do these changes first thing tomorrow

czender commented 7 years ago

xps_img is identical to the raw counts recorded in the Level 0 .bil file, and is recoverable from the raw imagery (which, unlike the Level 1 data, is not in netCDF format). We have asked whether to retain it on a few occasions previously, and the answer has been yes. Would you like to eliminate it from Level 1 always, sometimes (e.g., a switch) or never?

dlebauer commented 7 years ago

I believe that I have at times confused the raw exposure counts with calibrated radiances. If I recall, the idea behind keeping the xps_img was to facilitate re-calibration by our team or outside users. I think as a group we have shifted from a packrat to a more carefully curated approach to developing data and metadata products.

I am not sure what would be feasible for the Nov. release - but if we were to follow the NASA / MODIS data levels, I would expect to sort data products in the following way.

level 0: raw exposure counts
- dropping the xps_img netcdf here would be nice to have but not essential
level 1: calibrated radiances
- MODIS ex: https://modis.gsfc.nasa.gov/data/dataprod/mod02.php
- https://modis.gsfc.nasa.gov/data/dataprod/mod01.php
- might be overkill to produce this product if these are easily recoverable from reflectances
level 2: reflectances
- MODIS ex: https://lpdaac.usgs.gov/dataset_discovery/modis/modis_products_table/mod09gq
level 3: indices and soil mask
- MODIS ex: https://lpdaac.usgs.gov/dataset_discovery/modis/modis_products_table/mod13q1

Where the most important derived data products are the reflectances and indices. I am assuming that it won't be too difficult to separate out xps_img into a separate data product, that might be the best approach. This may take a few edits to the terrautils package https://github.com/terraref/terrautils/blob/master/terrautils/sensors.py#L141

hmb1 commented 7 years ago

@dlebauer @czender Have removed the excessive intermediate calibration data for default operation 1) the calibrated radiance (default) consists of A) rfl_img B) the coordinate vars "x", "y", "wavelength" C) the converted JSON metatdata in root and associated groups

2) The indices file (default) now contains
A) the index's regular and '_pxl' postfix B) The coordinate vars "x", "y" , and "wavelength" C) if the specific reflectances are required they can be obtained from 1)

  will add a command line switch such that if required the intermediate variables will available in 1)

I have push the changes to hmb1-patch13 - and will do some more debugging and push to master . tmrw my time

dlebauer commented 7 years ago

TODO:

split out xps_img and add to level 0 (does this mean write to raw_data or create new level 0 directory?)
- can be compressed asynchronously so it doesn't slow down workflow
reflectances --> level 2
indices --> level 3

terraref / reference-data

Requested changes to hyperspectral metadata #83

Completion Criteria