hotosm / OpenAerialMap

OpenAerialMap is an open service to provide access to a commons of openly licensed imagery and map layer services.
https://openaerialmap.org/
242 stars 31 forks source link

Develop additional metadata elements for OAM Catalog #55

Open smit1678 opened 7 years ago

smit1678 commented 7 years ago

From other OAM improvements, we're going to look to implement some additional metadata elements for PacDID.

Next action: itemize the elements we need to add.

Recap of current:

Current metadata tracked

element type name description
uuid string File unique URI to file
projection string Projection CRS of the datasource in full WKT format
bbox array Bounding Box Pair of min and max coordinates in CRS units, (min_x, min_y, max_x, max_y)
footprint string Datasource footprint WKT format, describing the actual footprint of the imagery
gsd number Ground Spatial Distance Average ground spatial distance (resolution) of the datasource imagery, expressed in meters
file_size number File Size File size on disk in bytes
acquisition_start string Acquisition Date Start First date of acquisition in UTC (Combined date and time representation)
acquisition_end string Acquisition Date End Last date of acquisition in UTC (Combined date and time representation) (optional)
title string Title Human friendly title of the image
platform string Type of imagery List of possible platform sources limited to satellite, aircraft, UAV, balloon, kite
provider string Imagery Provider Provider/owner of the OIN bucket
contact string Contact Name and email address of the data provider
properties object Properties Additional properties about the image (optional)
smit1678 commented 7 years ago

Below is a recap of the future additions tracked in previous threads (https://github.com/hotosm/OpenAerialMap/tree/master/metadata, #31, #17) plus input received from the SPC and the PacDID project.

Future additions from our last review

element type name description
license string License Type of license for imagery
tags string Tags User provided tags

Input from SPC/PacDID

element type name description
bands string Bands available Bands available or imaging bands
uav_type string Type of UAV Type of UAV used for collection
operator string Operator of the UAV Person or company that operated the UAV (could differ from provider)
gps string GPS Method Used GPS method used -- autonomous or differential
abstract string Abstract Short descriptor of the project or purpose of imagery

@cgiovando Additional metadata items for input before we select and finalize? Input on how we want to finalize? License and tag are candidates that we'll definitely want to include.

nbumbarger commented 7 years ago

Just wanted to update that we're planning to start work on this enhancement. Some aspects can't be developed until the list is finalized (form validation, for example), but generically scaffolding support for more metadata properties is something we can begin now.

smit1678 commented 7 years ago

From our thinking, we're leaning towards not having UAV-specific metadata items in the spec. bands could be an interesting addition. license and tags should be good to be included. The biggest question we're thinking about is how/if can we improve the core OIN spec as well as have an extended OAM version.

@mojodna Based on the mosaic work, is there anything we want to be thinking about for helping group items or connect between the Uploader and Catalog better?

nbumbarger commented 7 years ago

How would contributors input bands in the form? Would it be a set of checkboxes with some common ones? RE R G B NIR SWIR PAN? Or maybe Color Multispectral Panchromatic?

nbumbarger commented 7 years ago

@smit1678 - I'm going to need staging credentials and URLs for the catalog API in order to work on this feature next week. If anyone has access to these, could you get ahold of me on the Slack channel? As per the docs, the needed variables are:

OAM_DEBUG - Debug mode true or false (default)
AWS_SECRET_KEY_ID - AWS secret key id for reading OIN buckets
AWS_SECRET_ACCESS_KEY - AWS secret access key for reading OIN buckets
DBURI - MongoDB connection url
SECRET_TOKEN - The token used for post requests to /tms endpoint

cc @mojodna @danielfdsilva

smit1678 commented 7 years ago

To run this locally, you just need to set these yourself. Use your own AWS keys and then set the secret token. For our staging environment, you should have access through heroku. 

    _____________________________

From: Nick Bumbarger notifications@github.com Sent: Saturday, December 17, 2016 05:22 Subject: Re: [hotosm/OpenAerialMap] Develop additional metadata elements for OAM Catalog (#55) To: hotosm/OpenAerialMap openaerialmap@noreply.github.com Cc: Nate Smith nateasmith@gmail.com, Mention mention@noreply.github.com

@smit1678 - I'm going to need staging credentials and URLs for the catalog API in order to work on this feature next week. If anyone has access to these, could you get ahold of me on the Slack channel? As per the docs, the needed variables are:OAM_DEBUG - Debug mode true or false (default)AWS_SECRET_KEY_ID - AWS secret key id for reading OIN bucketsAWS_SECRET_ACCESS_KEY - AWS secret access key for reading OIN bucketsDBURI - MongoDB connection urlSECRET_TOKEN - The token used for post requests to /tms endpoint

cc @mojodna @danielfdsilva

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

nbumbarger commented 7 years ago

Thanks, @smit1678. I have it running.

mojodna commented 7 years ago

For the small-scale mosaicking (multiple images that are part of the same scene), we may want to add an optional "scene id" UUID (to group them together). That would also help people who've subdivided imagery and want to be able to associate it back together.

The the large-scale mosaicking (everything), gsd and acquisition dates are probably the main inputs. Having "quality" or "priority" might be helpful, but that's incredibly subjective.

nbumbarger commented 7 years ago

@mojodna I think the scene concept is implicit in the upload form, because it allows the user to add a list of images within a dataset but also allows for adding multiple datasets. It is true, however, that images submitted in a group within a dataset are not distinguished in the database. Are you proposing that the images in each submitted dataset be automatically assigned a unique group ID?

nbumbarger commented 7 years ago

@smit1678 @mojodna I added support for tags (optional) and license (required) to the form and catalog system (not yet merged); those seemed to be the highest priorities. Beyond the UUID question above, is there anything else we want to include at this time (quality, bands)? GSD is already generated, and acquisition dates are required with submission.

nbumbarger commented 7 years ago

In addition to adding a description of the new metadata properties to the API documentation, we also need to discuss where they should be exposed in the browser. For example, the license should probably be included in the image preview panel, but maybe we don't want to allow tags, considering that the user is currently able to attach an arbitrary number of them.

mojodna commented 7 years ago

Are you proposing that the images in each submitted dataset be automatically assigned a unique group ID?

Yes. The uploader is aware of relationships between images, but the rest of the toolchain isn't.

I'm tempted to advocate for the inclusion of bands, but we can table that until/if we decide that RGB(A) imagery is overly limiting.

We can also follow a philosophy of allowing common properties to emerge using well-known tags. This would require that metadata is mutable (in order to change them in response to consensus), which I'm not sure is the case right now.

@cgiovando recommended looking through OGC Earth Observation Metadata profile of Observations & Measurements to see if there's anything we're missing and to keep an eye out for future harmonization.

mojodna commented 7 years ago

Are you proposing that the images in each submitted dataset be automatically assigned a unique group ID?

Yes. The uploader is aware of relationships between images, but the rest of the toolchain isn't.

Further elaborating on this, the uploader should assign unique ids to each scene/dataset. Right now, the entire upload gets an id, as do the individual images, but there's no id that ties images to scenes within an upload.

E.g.: https://upload-api.openaerialmap.org/uploads/58655b07f91c99bd00e9c7ab

smit1678 commented 7 years ago

I started looking at the OGC EO profile, http://docs.opengeospatial.org/is/10-157r4/10-157r4.html, to get a sense of what/if we're missing. The spec covers much more than just optical and can be very mission/equipment specific.

I've tried to capture the Optical earth observation product information needed here: https://docs.google.com/spreadsheets/d/1yhX1cTfpa75wSKCDtRJTJ3rViw6exenTMXWsb1gpnnc/edit?usp=sharing.

Not mapped to OIN/OAM's current metadata spec. I think there will be ways for future harmonization. We seem to capture a subset of the information.

Couple items that we are missing that are specific to Optical products:

smit1678 commented 7 years ago

Related to https://github.com/hotosm/oam-uploader-api/issues/52 and looking at this further, it looks like one useful upgrade to the core OIN spec is creation_date and potentially modification_date. According to OGC, creation_date is:

creation date for the metadata item. When retrieved from a metadata catalogue, the creationDate is the date when the metadata item was ingested for the first time (i.e. inserted) in the catalogue.

This differs from acquisition_ time which is called phenomenonTime by OGC.

for upload IDs

Perhaps we can look at how OGC does it by using something like:

mojodna commented 7 years ago

While working on re-indexing the HOT OIN bucket, I realized that the OIN metadata JSON should include something that identifies it as such (along with a version number), similar to how TileJSON does it:

{
  "...": "...",
  "tilejson": "2.1.0"
}

(The indexer currently attempts to treat all JSON files present in a bucket as OIN metadata, which is no longer true for the HOT bucket (there's footprint GeoJSON + TileJSON for the tiler); I have a temporary workaround that'll show up in a PR shortly, but it checks for uuid, which isn't a fully reliable check.)

mojodna commented 7 years ago

I'm using uploaded_at (for now) to signify when the imagery was updated (which is distinct from when it was ingested into the catalog).

smit1678 commented 7 years ago

Ok, let's bring this home to a close. I started this in OIN: https://github.com/openimagerynetwork/oin-metadata-spec/issues/14.

@mojodna @nbumbarger @cgiovando These changes look right and agreed upon?

Current OAM through Uploader:

{
    "uuid": "",
    "title": "",
    "projection": "",
    "bbox": [],
    "footprint": "",
    "gsd": ,
    "file_size": ,
    "acquisition_start": "",
    "acquisition_end": "",
    "platform": "",
    "provider": "",
    "contact": "",
    "properties": {
        "sensor": "",
        "thumbnail": "",
        "tms": ""
    }
}

New additions proposed to OAM through Uploader:

{
    "oin": "",              # OIN Version number. An update to OIN spec
    "uploaded_at" : "",     # date metdata uploaded into OIN. An update to OIN spec
    "uuid": "",
    "title": "",
    "projection": "",
    "bbox": [],
    "footprint": "",
    "gsd": ,
    "file_size": ,
    "acquisition_start": "",
    "acquisition_end": "",
    "platform": "",
    "provider": "",
    "contact": "",
    "creation_date": "",
    "properties": {
        "sensor": "",
        "thumbnail": "",
        "license": "",          # new addition, doesn't affect OIN spec
        "tags": "",             # new addition, doesn't affect OIN spec
        "tms": "",              
        "wmts": ""              # new addition, doesn't affect OIN spec
    }
}
mojodna commented 7 years ago

👍

1.1 for the OIN version?