terraref / reference-data

Coordination of Data Products and Standards for TERRA reference data
https://terraref.org
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

Format and method for publishing authoritative sensor metadata #54

Closed craig-willis closed 7 years ago

craig-willis commented 7 years ago

The JSON data produced by Lemnatec includes a "sensor_fixed_metadata" section with the sensor name, description, manufacturer, and serial number. We also have copies of the sensor manuals and data sheets with calibration data, sometimes as PDF or CSV.

For example:

    "sensor_fixed_metadata": {
      "sensor manufacturer": "Headwall Scientific",
      "sensor product name": "VNIR",
      "sensor serial number": "PaS6-114",
      "sensor description": "hyperspectral camera to measure visible near infrared (VNIR) radiation",
      "sensor purpose": "measures spectral reflectance from 380nm to 1000nm"
    },

The goal is to publish this authoritative sensor metadata in a way that can be easily consumed programmatically. Ideally, the sensor itself would be described using some standard format and linked to the datasheet/manual/calibration data.

SensorML seems to be targeted more at the sensor manufacturer and much more complex than needed. The OGC SensorThings API defines a model that includes a simple Sensor object (Name, Description) that can be linked to a SensorML or PDF data sheet. There is also the newer Sensor Network Ontology.

Initial options appear to be to:

JeffWhiteAZ commented 7 years ago

The JSON example from LemnaTec would be enough information to allow a researcher to write a sentence in a methods section for a paper, but it wouldn't help someone troubleshooting data problems. Would we have information on sensor maintenance and routine calibration linked via the serial number and contained in a separate sensor log?

craig-willis commented 7 years ago

@JeffWhiteAZ Are we already collecting this information? I hadn't considered it, but linking from the sensor description to a maintenance log shouldn't be too difficult.

JeffWhiteAZ commented 7 years ago

A more less binary flag is set for each sensor, every time the operator logs the system status. You can see this at: https://docs.google.com/spreadsheets/d/1eQSeVMPfrWS9Li4XlJf3qs2F8txmddbwZhjOfMGAvt8/edit#gid=1394972273 I don't believe that more detailed information on calibrations or maintenance is being captured, in part because we are still learning from LemnaTec as to what these involve. Note also that the current log references generic sensors and not specific serial numbers. If an instrument were swapped (e.g., for upgrade or replace a defective system), we would rely on the operator to provide additional notes.

dlebauer commented 7 years ago

Notes on scope of content:

from @ashiklom and @serbinsh developing a mini relational database (SQLite) with tables:

craig-willis commented 7 years ago

@dlebauer Please clarify the above comment -- when we spoke yesterday we discussed the fixed sensor metadata from Lemnatec, PDF manuals, and PDF/CSV calibration data.

From your comment, it sounds like you want this stored in a database? From my understanding, most of the information was in a PDF file -- so are you expecting that we'd extract data from the PDF to put in the relational database.

dlebauer commented 7 years ago

@craig-willis on one hand, those are quick notes from a conversation with colleagues on what is useful, and much of it is already within the scope (e.g. RSR, camera info) + calibration details. once we have a format for storing the relevant sensor metadata, we should extract relevant information from the PDFs and make them accessible. This can all be in json - I only mentioned SQLite because that is what @ashiklom is working with. It doesn't matter so much how the data is stored but if there is already a database model that we can use it might be worth adopting it / helping build it.

craig-willis commented 7 years ago

Thanks, @dlebauer -- I understand now. If available, pointers to the project, database schema, and any associated documentation would be helpful.

Can you remind me where I can find the PDFs for the various sensors?

dlebauer commented 7 years ago

Calibration docs

on google drive

Vaisala.pdf Ocean Optics.pdf FLIR.PDF G4-384 VNIR Wavelength_Calibration.pdf G4-383 SWIR Wavelength_Calibration.pdf Skye_NDVI_PRI_Protokolle.pdf 09001-SensorDocumentation.pdf

Generic sensor data sheets (manufacturer sales sheets)

on google drive

Stereocamera_Specification_V06.pdf SWIR.pdf VNIR 2015.pdf FLIR-SC655-Datasheet.pdf Vaisala-GMP343-CO-Probe-Data-Sheet.pdf Prosilica_GT_DataSheet_3300_v3.0.0_en (1) (1).pdf Thies CLIMA SENSOR US.pdf datasheet2012.pdf

dlebauer commented 7 years ago

@craig-willis there is already a spreadsheet on google drive that summarizes a lot of the general information about these sensors: https://docs.google.com/spreadsheets/d/1xkEHQd_5x3Yzv3f61bfns3ZFoP7xXq8AUNK4fmS3kXg/edit#gid=194274492

(note that on google drive these are in the folder terraref/terraref-danforth-team/field scanner operations)

craig-willis commented 7 years ago

@dlebauer I've been comparing the "sensor_fixed_metadata" in the latest *metadata.json files to the calibration certificates in Google drive and noticed the following:

I expect this is known, but there are no calibration certificates for the Stereo-VIS, PSII camera, 3-D scanner. Is there another way to verify the model/serial number?

dlebauer commented 7 years ago

@craig-willis I've put the issues with inconsistent and missing sensor data sheets in #56 and have assigned this to @TinoDornbusch

craig-willis commented 7 years ago

@dlebauer

Discussed two options for storing sensor data with @max-zilla and @robkooper today:

  1. Store in Clowder. Create a Collection "Sensor Information" with per-sensor dataset. The dataset would contain the sensor metadata, PDF, calibration matrices, photos, etc. Each sensor would therefore have a linkable unique ID in Clowder. With the postgres plugin enabled, we can also add the sensor as a geostream/stream, linking the data from the sensor to the sensor dataset.
  2. Store outside of Clowder. Host the sensor description information (metadata, manuals, etc) in external service (could be a simple webserver). This would serve up both human-readable and machine-readable (json) content. The URI for each sensor would serve as an ID (e.g., https://terraref.org/sensors/VNIR). This could be added to the geostream/stream metadata.

The plan is to try out no 1 this week for a single sensor and test it out, with a first pass by the end of this week. Let me know what you think.

dlebauer commented 7 years ago

Both of approaches sound sensible and it makes sense to start with the first.

craig-willis commented 7 years ago

Thanks, @dlebauer

I've started a collection on the dev instance:

https://terraref.ncsa.illinois.edu/clowder-dev/collection/57ea83a4e4b0581365b856c0

Also -- do we have more LemnaTec specifications similar to the stereo camera. This is a good source of contextual information that isn't necessarily in the individual sensor datasheets. If there's an overall specification for the Scanalyzer, that might be helpful too.

dlebauer commented 7 years ago

@craig-willis you can feel free to curate the collection on the production instance if it would be easier.

@TinoDornbusch will know if there are specification docs from Lemnatec similar to the one from the stereo camera.

There is a specification for the entire Gantry 7100019_System_Specification_V10_General_Version.pdf.

craig-willis commented 7 years ago

@dlebauer I'm using the dev instance because it has the postgres plugin enabled, which allows me to create the associated geostreams/sensor records, and I'm learning a few lessons along the way. I will certainly move to the production instance as soon as it makes sense.

craig-willis commented 7 years ago

@dlebauer I'm looking for the sensor_fixed_metada for the Quantum PAR and Ocean Optics spectrometer, but don't see folders for either in Globus. Is this expected? I can create the fixed metadata from the Lemnatec sensor documentation.

dlebauer commented 7 years ago

These are both in the EnvironmentalLogger meta-data

  "environment_sensor_fixed_infos": {
    "par_sensor": {
      "manufacturer": "www.apogeeinstruments.com",
      "model": "SQ214",
      "location in gantry system": "top of gantry"
    },

The Ocean Optics spectrometer:

    "spectrometer": {
      "manufacturer": "www.oceanoptics.com",
      "model": "STS-VIS",
      "location in gantry system": "top of gantry"
    }
craig-willis commented 7 years ago

@dlebauer it looks like the SensorML group has created a set of vocabularies (sensorml.com/orr/) that will be useful here without requiring us to conform to the SensorML spec.

Question: The PDF 09001-SensorDocumentation_V4_oK.PDF from Lemnatec has a ton of device/sensor information, but it would be cumbersome to extract (copy/paste) values from PDF. Who would I contact to find out whether the information is available in another format?

dlebauer commented 7 years ago

@TinoDornbusch @LTBen @markus-radermacher-lemnatec : do you have the sensordocumentation in a different format, e.g. word with tables etc?

@craig-willis I would consider the sensor metadata that we already have in spreadsheets and text files (like metadata.json and google sheets) to be the first priority. If it is onerous to extract data from the pdf, we can extract additional parameters can be added based on need. You could try to import the pdf into word; I've had good luck with a software called 'table extractor' http://pdf.zanran.com/index.php#section-06 and https://pdftables.com/ (which I used to generate 09001-SensorDocumentation_V4_oK.xlsx ... does that help?

craig-willis commented 7 years ago

Thanks @dlebauer -- yes, this is helpful and I do understand the priorities. However, some of this information is duplicated across all three sources (JSON, Sheets, PDF). For the calibration, certification and test-specimen data, which should I consider to be the more reliable source of information: JSON, Google sheets or PDF? In some ways it seems that the PDF is actually more up-to-date.

dlebauer commented 7 years ago

I'd consider reliability json > PDF > google sheets. If you see any differences between json and PDF we can use the json value but ask Lemnatec to resolve them.

craig-willis commented 7 years ago

@dlebauer Focusing just on the calibration information, the JSON data is both incomplete and inconsistent. The PDF reflects what's actually in the calibration certificates. For example, the sensor_fixed_metadata only has calibration dates for 3 sensors, we have certificates for 7 (same in PDF); the sensor_fixed_metadata has three different fields "calibration available", "Calibration available", "is calibrated" -- set to false for 3 sensors and not present for the rest.

My inclination would be to use the PDF as the source of truth. If the goal is to use the sensor_fixed_metadata as the source of truth, then I guess I'll need to audit and compare to the PDF for accuracy and file a ticket with corrections. Who should I contact for more information about the sensor_fixed_metadata? I'd like to get documentation from Lemnatec about each of the fields.

dlebauer commented 7 years ago

@craig-willis ok, makes sense to use PDF as source of truth but to keep track of inconsistencies and let Lemnatec know

craig-willis commented 7 years ago

@dlebauer Looking back at https://github.com/terraref/reference-data/issues/42, this seems to be a duplicate. Feel free to assign to me if you agree.

I've created and populated a collection on the production Clowder instance with the various PDFs (datasheets, calibration certs) and with the dataset metadata set to the original raw LemnaTec fixed sensor data.

https://terraref.ncsa.illinois.edu/clowder/collection/58035fa34f0c4a438cbb53dc

Because of all of the work already done on the LemnaTec fixed sensor data (particularly 151, 170), I'm concerned about me arbitrarily changing the structure of the JSON data for this ticket.

I've created a separate Github repository with the fixed sensor metadata to allow us to use the PR review/approval process for changes to the JSON structure and content.

https://github.com/terraref/sensor-metadata/

I will tag this before adding new data (e.g., RSR data, updated calibration information ,etc).

Let me know if you have any concerns.

craig-willis commented 7 years ago

I think this issue can be closed after we have the sensor metadata review. Additional issues can be opened to address feedback or recommended changes.

craig-willis commented 7 years ago

Closing this issue following the Sensor Metadata review meeting on 1/9. Will open separate tickets for specific tasks going forward.