internetofwater / geoconnex.us

URI registry for https://geoconnex.us based URIs
Other
23 stars 14 forks source link

pygeoapi URI refactor #93

Closed ksonda closed 3 years ago

ksonda commented 3 years ago

@webb-ben has been working on bringing our desired pygeoapi up to date with master here

I need to set up a demo server that deploys his emerging solution with some id/uri specification scenarios to assist with PR review (and hopefully merge)

ksonda commented 3 years ago

@webb-ben, can you please move your modifications to a fork of geopython/pygeoapi rather than of internetofwater/geoconnex.us ? This will make the PR easier

DONE

ksonda commented 3 years ago

Below is a table of id/uri specification scenarios by implementation, comparing the pygeoapi master branch, the geoconnex branch, and @webb-ben 's updated geoconnex branch by html and json-ld representations and the flattened result of the jsonld in jsonld playground.

Some outstanding issues:

row pygeoapi branch scenario example item link jsonld playground flattened
1 master @id is api, id is 'id' https://pygeoapi-master.internetofwater.dev/collections/obs_id_id/items/371 https://json-ld.org/playground/#startTab=tab-flattened&json-ld=https%3A%2F%2Fpygeoapi-master.internetofwater.dev%2Fcollections%2Fobs_id_id%2Fitems%2F371%3Ff%3Djsonld&context=%7B%7D
2 master @id is api, id is not 'id' https://pygeoapi-master.internetofwater.dev/collections/obs_id_name/items/obs%20371 https://json-ld.org/playground/#startTab=tab-flattened&json-ld=https%3A%2F%2Fpygeoapi-master.internetofwater.dev%2Fcollections%2Fobs_id_name%2Fitems%2Fobs%2520371%3Ff%3Djsonld&context=%7B%7D
3 current geoconnex @id is api, id is 'id' https://pygeoapi-geoconnex.internetofwater.dev/collections/obs_id_id/items/371 https://json-ld.org/playground/#startTab=tab-flattened&json-ld=https%3A%2F%2Fpygeoapi-geoconnex.internetofwater.dev%2Fcollections%2Fobs_id_id%2Fitems%2F371%3Ff%3Djsonld&context=%7B%7D
4 current geoconnex @id is api, id is not 'id' https://pygeoapi-geoconnex.internetofwater.dev/collections/obs_id_name/items/obs%20371 https://json-ld.org/playground/#startTab=tab-flattened&json-ld=https%3A%2F%2Fpygeoapi-geoconnex.internetofwater.dev%2Fcollections%2Fobs_id_name%2Fitems%2Fobs%2520371%3Ff%3Djsonld&context=%7B%7D
5 current geoconnex @id is PID, id is 'id' https://pygeoapi-geoconnex.internetofwater.dev/collections/obs_id_id_uri/items/371 https://json-ld.org/playground/#startTab=tab-flattened&json-ld=https%3A%2F%2Fpygeoapi-geoconnex.internetofwater.dev%2Fcollections%2Fobs_id_id_uri%2Fitems%2F371%3Ff%3Djsonld&context=%7B%7D
6 current geoconnex @id is PID, id is not 'id' https://pygeoapi-geoconnex.internetofwater.dev/collections/obs_id_name_uri/items/obs%20371 https://json-ld.org/playground/#startTab=tab-flattened&json-ld=https%3A%2F%2Fpygeoapi-geoconnex.internetofwater.dev%2Fcollections%2Fobs_id_name_uri%2Fitems%2Fobs%2520371%3Ff%3Djsonld&context=%7B%7
7 @webb-ben @id is api, id is 'id' https://pygeoapi-uri-pr.internetofwater.dev/collections/obs_id_id/items/371 https://json-ld.org/playground/#startTab=tab-flattened&json-ld=https%3A%2F%2Fpygeoapi-uri-pr.internetofwater.dev%2Fcollections%2Fobs_id_id%2Fitems%2F371%3Ff%3Djsonld&context=%7B%7D
8 @webb-ben @id is api, id is not 'id' https://pygeoapi-uri-pr.internetofwater.dev/collections/obs_id_name/items/obs%20371 https://json-ld.org/playground/#startTab=tab-flattened&json-ld=https%3A%2F%2Fpygeoapi-uri-pr.internetofwater.dev%2Fcollections%2Fobs_id_name%2Fitems%2Fobs%2520371%3Ff%3Djsonld&context=%7B%7D
9 @webb-ben @id is PID, id is 'id' https://pygeoapi-uri-pr.internetofwater.dev/collections/obs_id_id_uri/items/371 https://json-ld.org/playground/#startTab=tab-flattened&json-ld=https%3A%2F%2Fpygeoapi-uri-pr.internetofwater.dev%2Fcollections%2Fobs_id_id_uri%2Fitems%2F371%3Ff%3Djsonld&context=%7B%7D
10 @webb-ben @id is PID, id is not 'id' https://pygeoapi-uri-pr.internetofwater.dev/collections/obs_id_name_uri/items/obs%20371 https://json-ld.org/playground/#startTab=tab-expanded&json-ld=https%3A%2F%2Fpygeoapi-uri-pr.internetofwater.dev%2Fcollections%2Fobs_id_name_uri%2Fitems%2Fobs%2520371%3Ff%3Djsonld
ksonda commented 3 years ago

@dblodgett-usgs and I should write up the desired behavior we want, justify it with our use case, and submit it to pygeoapi issue.

I think what we want, is for there to be a boolean configuration option under resource: called geojsonld: whose default value is true. When it is false, we want the json-ld representation to behave in the following manner:

  1. No geojsonld context, only what is manually specified in the context: configuration. or perhaps the default context in this situation is {"id":"@id"}, since it does seem to be a core pygeoapi thing to create this object id which is ported downstream to all the html templates and stuff.
  2. No properties block. All attributes should be the same level as @id
  3. uri_field when specified, becomes id
  4. NOTE: If uri_field is specified, but there happens to be some attribute named id which is not the intended URI, what happens? 4.a Even if uri_field is specified, id_field is still specified which curently becomes id. What if say, id_field we want to be the attribute station_id, but there is also some attribute named id?

any other changes we want?

What do we think of auto generating schema:geo or geosparql/WKT representations of the geometry instead of geojson geometry?

dblodgett-usgs commented 3 years ago

Yep -- this is the desired behavior.

No geometry unless it's point since it breaks jsonLD rules.

I think it's an error condition if there are conflicts with the id attribute.

Current:

{
    "@context": [
        {
            "schema": "https://schema.org/",
            "geojson": "https://purl.org/geojson/vocab#",
            "Feature": "geojson:Feature",
            "FeatureCollection": "geojson:FeatureCollection",
            "Point": "geojson:Point",
            "bbox": {
                "@container": "@list",
                "@id": "geojson:bbox"
            },
            "coordinates": {
                "@container": "@list",
                "@id": "geojson:coordinates"
            },
            "features": {
                "@container": "@set",
                "@id": "geojson:features"
            },
            "geometry": "geojson:geometry",
            "id": "@id",
            "properties": "geojson:properties",
            "type": "@type"
        },
        {
            "schema": "https://schema.org/name",
            "name": "schema:name",
            "description": "schema:description",
            "subjectOf": {
                "@id": "schema:subjectOf",
                "@type": "@id"
            }
        }
    ],
    "type": "Feature",
    "geometry": {
        "type": "Point",
        "coordinates": [
            -85.3199473,
            33.22511878
        ]
    },
    "properties": {
        "fid": 1,
        "id": "https://geoconnex.us/ref/gages/1029567",
        "uri": "https://geoconnex.us/ref/gages/1029567",
        "name": "WEHADKEE CREEK NEAR PITTMAN AL",
        "description": "USGS NWIS Stream/River/Lake Site 02339210: WEHADKEE CREEK NEAR PITTMAN AL",
        "subjectOf": "https://waterdata.usgs.gov/monitoring-location/02339210",
        "provider": "https://waterdata.usgs.gov",
        "provider_id": "02339210",
        "nhdpv2_REACHCODE": "03130002000353",
        "nhdpv2_REACH_measure": 9.594026996679105,
        "nhdpv2_COMID": 3291414
    },
    "id": "https://geoconnex.us/ref/gages/1029567"
}

Desired:

{
    "@context": [
        {
            "schema": "https://schema.org/",
            "id": "@id",
            "type": "@type",
            "name": "schema:name",
            "description": "schema:description",
            "subjectOf": "schema:subjectOf"
        }
    ],
    "type": "Feature",
    "fid": 1,
    "id": "https://geoconnex.us/ref/gages/1029567",
    "uri": "https://geoconnex.us/ref/gages/1029567",
    "name": "WEHADKEE CREEK NEAR PITTMAN AL",
    "description": "USGS NWIS Stream/River/Lake Site 02339210: WEHADKEE CREEK NEAR PITTMAN AL",
    "subjectOf": "https://waterdata.usgs.gov/monitoring-location/02339210",
    "provider": "https://waterdata.usgs.gov",
    "provider_id": "02339210",
    "nhdpv2_REACHCODE": "03130002000353",
    "nhdpv2_REACH_measure": 9.594026996679105,
    "nhdpv2_COMID": 3291414
}
ksonda commented 3 years ago

in your "desired" you left out geometry altogether even though this is a point. Intentional?

dblodgett-usgs commented 3 years ago

I dropped it since it uses the geojson context. I'm fine leaving it in, but I'm not really sure it adds THAT much. If we were to add them back, it would be more useful to encode as a schema:geo object?

ksonda commented 3 years ago

it would be more useful to encode as a schema:geo object? agreed or geosparql/wkt? I guess for pygeoapi purposes probably schema:geo

webb-ben commented 3 years ago

@dblodgett-usgs I have a working version that creates your desired JSON-LD. The issue with using "id": "@id" is that pygeoapi uses 'id' field as its internal reference to the item. Changing the 'id' to make the JSON-LD work results in a wonky html page (either two id fields in the properties block, or the uri as the name of the item).

I'm happy to share more details but it might be a bit too verbose for this thread.

dblodgett-usgs commented 3 years ago

Great! What's the next step then?

ksonda commented 3 years ago

@webb-ben , you showed me two scenarios.

(1) where you are routing id that is uri_field to the html templates :

image

(2) where routing id that is id_field to the html templates image

At first glance, (2) I think is closer to what we want

Questions:

webb-ben commented 3 years ago

The canonical URL in both situations is https://geoconnex.us/ref/gages/1029567. If uri_field is not specified it would become http://[HOSTNAME]/collections/gages/items/1029567.

I will look into the order of the entries... I think we would have to use ordered dictionaries. Should be easy enough!

webb-ben commented 3 years ago

Is there ever a case where geojsonld will be enabled but uri_field is not specified? If they won't always be used together, what should the difference in their behaviors be? If they always will be used together do we need to declare both?

ksonda commented 3 years ago

If it's not a huge issue I'd say to let them vary independently.

webb-ben commented 3 years ago

uri_field affects what will end up as the cannonical URL, and what is set as id in the json-ld. geojsonld affects the format of the json-ld and html.

uri_field=uri, geojsonld=True
{
  "@context": [
    {
      "schema": "https://schema.org/",
      "id": "@id",
      "type": "@type",
      "name": "schema:name",
      "description": "schema:description",
      "subjectOf": "schema:subjectOf"
    }
  ],
  "type": "Feature",
  "pygeoapi_id": 1029567,
  "fid": 1,
  "uri": "https://geoconnex.us/ref/gages/1029567",
  "name": "WEHADKEE CREEK NEAR PITTMAN AL",
  "description": "USGS NWIS Stream/River/Lake Site 02339210: WEHADKEE CREEK NEAR PITTMAN AL",
  "subjectOf": "https://waterdata.usgs.gov/monitoring-location/02339210",
  "provider": "https://waterdata.usgs.gov",
  "provider_id": "02339210",
  "nhdpv2_REACHCODE": "03130002000353",
  "nhdpv2_REACH_measure": 9.594026996679105,
  "nhdpv2_COMID": 3291414,
  "id": "https://geoconnex.us/ref/gages/1029567"
}
uri_field=None, geojsonld=True
{
  "@context": [
    {
      "schema": "https://schema.org/",
      "id": "@id",
      "type": "@type",
      "name": "schema:name",
      "description": "schema:description",
      "subjectOf": "schema:subjectOf"
    }
  ],
  "type": "Feature",
  "pygeoapi_id": 1029567,
  "fid": 1,
  "uri": "https://geoconnex.us/ref/gages/1029567",
  "name": "WEHADKEE CREEK NEAR PITTMAN AL",
  "description": "USGS NWIS Stream/River/Lake Site 02339210: WEHADKEE CREEK NEAR PITTMAN AL",
  "subjectOf": "https://waterdata.usgs.gov/monitoring-location/02339210",
  "provider": "https://waterdata.usgs.gov",
  "provider_id": "02339210",
  "nhdpv2_REACHCODE": "03130002000353",
  "nhdpv2_REACH_measure": 9.594026996679105,
  "nhdpv2_COMID": 3291414,
  "id": "http://localhost:5000/collections/gages/items/1029567"
}
uri_field=uri, geojsonld=False
{
  "@context": [
    "https://geojson.org/geojson-ld/geojson-context.jsonld",
    {
      "schema": "https://schema.org/",
      "id": "@id",
      "type": "@type",
      "name": "schema:name",
      "description": "schema:description",
      "subjectOf": "schema:subjectOf"
    }
  ],
  "type": "Feature",
  "geometry": {
    "type": "Point",
    "coordinates": [
      -85.3199473,
      33.22511878
    ]
  },
  "properties": {
    "fid": 1,
    "uri": "https://geoconnex.us/ref/gages/1029567",
    "name": "WEHADKEE CREEK NEAR PITTMAN AL",
    "description": "USGS NWIS Stream/River/Lake Site 02339210: WEHADKEE CREEK NEAR PITTMAN AL",
    "subjectOf": "https://waterdata.usgs.gov/monitoring-location/02339210",
    "provider": "https://waterdata.usgs.gov",
    "provider_id": "02339210",
    "nhdpv2_REACHCODE": "03130002000353",
    "nhdpv2_REACH_measure": 9.594026996679105,
    "nhdpv2_COMID": 3291414
  },
  "id": "https://geoconnex.us/ref/gages/1029567"
}

The links are removed for readability but are included in each JSON-LD response (this is easy to toggle).

ksonda commented 3 years ago

right so, like this I think. (context also includes whatever was declared in config.yml) image

"schema:geo": { "@type": "schema:GeoCoordinates", "schema:latitude": "33.14734284", "schema:longitude": "-85.2818902" }

webb-ben commented 3 years ago

Use this command to explore my working solution! docker run -p 5000:80 -d --rm webbben/pygeoapi The URIs only work for the /items/ page to allow easier navigation to /items/[item] from the set.

ksonda commented 3 years ago

This gets us the functional json-ld we want for all four scenarios. Not sure the way schema:geo works here is pretty with the geojson coordinates staying there in this array that is hanging out alongside the "lat" and "lon" properties. It's linted the way it needs to though.

@dblodgett-usgs , due to the way pygeoapi has been built since Greg's PR, it looks like its necessary for in the json-ld for these ancilliary properties "id_" and "pygeoapi_id" to be there when geojsonld: false to correctly route the desired uri/url to the html templates and the canonical url, due to over-reliance throughout the rest of pygeoapi on the geojson id_field: which becomes hard-coded as "id". I don't know if geopython community will think this is a big deal. I don't think its a big deal because that stuff won't be parsed since no context, and the geojson representation is unaffected.

dblodgett-usgs commented 3 years ago

We'll see what others think. I'd say let's go ahead and open a PR with this?

ksonda commented 3 years ago

mkay. Do you think a better reviewer to request is Tom K or Richard Law?

dblodgett-usgs commented 3 years ago

Probably Tom -- I'm not sure Richard has merge rights?

ksonda commented 3 years ago

draft PR https://github.com/ksonda/pygeoapi/pull/1#issue-616904932

ksonda commented 3 years ago

addressed by https://github.com/geopython/pygeoapi/pull/676