opencdms-dev / pyopencdms-old

⭐🐍 pyopencdms aims to build a common Python API on top of multiple Climate Data Management Systems (CDMS) that use different underlying database engines
MIT License
4 stars 6 forks source link

Add proper request and response schema in pygeoapi provider #59

Closed faysal-ishtiaq closed 1 year ago

faysal-ishtiaq commented 1 year ago

There is no request schema and the response schema is not correct for pygeoapi climsoft provider. Attaching a screenshot for reference:

Screenshot from 2022-09-29 21-28-20

isedwards commented 1 year ago

@tomkralidis - could you give us a quick pointer on what we're likely to be missing. We're trying to implement transactions as well as read (#47)

tomkralidis commented 1 year ago

@isedwards the response is a 201 with an empty payload.

For the request schema, this needs to be implemented in pygeoapi core. Proposal:

  1. pygeoapi core updates:
    • have pygeoapi base provider add a stub for a get_schema function, which would return a dict of a JSON schema of the given provider resource model
    • if implemented, pygeoapi OpenAPI uses as follows:
    • request schema for .../items HTTP POST (transaction POST, not CQL)
    • request schema for .../item/{itemId} HTTP PUT
    • ~response schema for .../items HTTP GET, HTTP CQL POST~
    • response schema for .../items/{itemId} HTTP GET
    • if not implemented, then current functionality
    • it is the responsibility of the provider to ensure a valid JSON schema (as a Python dict) is returned via get_schema
  2. pyopencdms plugin implements get_schema accordingly

If this makes sense I'll implement the updates in pygeoapi core next week.

isedwards commented 1 year ago

Yes please @tomkralidis sounds great 👍🏼

tomkralidis commented 1 year ago

@isedwards branch which you can test against: https://github.com/geopython/pygeoapi/tree/collection-item-schema

Once your provider plugin implements the new get_schema method (returning a dict of a JSON schema [inline or reference]), then the OpenAPI definition should add the schema to the applicable OpenAPI endpoint responses.

Can you test and report back? Once we have something working here, I'll issue the PR in pygeoapi core.

Note that there is also discussing on additionally supporting schema output in OGC API - Features proper as an endpoint, but this appears to be a TODO (was thinking /collections/{collectionId}/schema, which I may just implement depending how discussions go in OGC on a probable approach).

faysal-ishtiaq commented 1 year ago

I have implemented get_schema() here

Now it's showing request schema in the swagger doc

image

However, when I create a post request, it returns an error like this image

On the terminal, it shows these logs

[2022-10-04T21:01:43Z] {/home/faysal/PycharmProjects/pyopencdms/venv/lib/python3.8/site-packages/pygeoapi/api.py:3394} ERROR - Invalid filter language
[2022-10-04T21:01:43Z] {/home/faysal/PycharmProjects/pyopencdms/venv/lib/python3.8/site-packages/werkzeug/_internal.py:225} INFO - 127.0.0.1 - - [04/Oct/2022 21:01:43] "POST /collections/climsoft/items HTTP/1.1" 400 -

which are not much helpful. Then I tried to print and log the data I am getting in create method here

But they don't show up in the terminal.

And I am using this logger https://github.com/opencdms/pyopencdms/blob/pygeoapi-provider/opencdms/pygeoapi/climsoft_provider.py#L22

tomkralidis commented 1 year ago

Can you try changing the content type in your request to application/geo+json and report back? We are locking on this request header currently, which may need to be updated for better detection.

faysal-ishtiaq commented 1 year ago

Thank you @tomkralidis. Changing to application/geo+json resolved the issue.

However, probably the curl command generated in the swagger doc should probably generate the header as -H 'Content-Type: application/geo+json' instead of -H 'Content-Type: application/json' \.

Is it possible to set different schema for GET /items, GET /items/{item_id}, POST /items, PUT /items/{item_id} and DELETE /items/{item_id} endpoints at the moment?

isedwards commented 1 year ago

@tomkralidis is it possible to add an optional argument to get_schema() to allow a different schema to be returned depending on the endpoint being called?

tomkralidis commented 1 year ago

Is the use case that schemas will differ between data ingest and output?

We could extend with the caller passing something like 'get', create', 'replace', 'delete', and then the return is a dict of the media type and associated schema dict. Does that work?

Note that there is also an ongoing related discussion at https://github.com/opengeospatial/ogcapi-features/issues/771 on how to properly qualify POST on items. Still, I think we would be relatively safe for now with the above while the overall discussion at OGC evolves / is resolved.

isedwards commented 1 year ago

Thank you @tomkralidis. When we POST we don't send the id. Also some of the fields are not required.

Passing 'get', 'create', 'replace' or 'delete' should be sufficient (would you use these strings directly or define an enum?)

tomkralidis commented 1 year ago

Ok. I also assume you should be able to derive a GeoJSON from an item(s) call?

Yes we would likely define an enum in the base class.

tomkralidis commented 1 year ago

For reference, I am loosely following the collection/schemas endpoint approach from ldproxy (https://demo.ldproxy.net/daraa/api/?f=html) (cc @cportele)

tomkralidis commented 1 year ago

@faysal-ishtiaq @isedwards see updates in branch https://github.com/geopython/pygeoapi/tree/collection-item-schema with updated signature:

    def get_schema(self, schema_type: SchemaType = SchemaType.item):
        """ 
        Get provider schema model

        :param schema_type: `SchemaType` of schema (default is 'item')

        :returns: tuple pair of `str` of media type and `dict` of schema
                  (i.e. JSON Schema)
        """ 

To be clear, the above/branch implements schema definition in the OpenAPI document, not as an HTTP endpoint per se (this is TBD with how https://github.com/opengeospatial/ogcapi-features/issues/771 will evolve).

Please test and report back when you have a chance?

faysal-ishtiaq commented 1 year ago

@tomkralidis I have tried out the updated pygeoapi from this branch https://github.com/geopython/pygeoapi/tree/collection-item-schema and I also updated our provider like this https://github.com/opencdms/pyopencdms/commit/bf24cf964b6d0a74aa686ec75c0ca698bb801ad3

But now, there is no schema in swagger docs

Screenshot-get Screenshot-update Screenshot-create

tomkralidis commented 1 year ago

@faysal-ishtiaq where are your schema() functions defined? They should returning a tuple of a media type and a dict. Following this, the pygeoapi machinery should create the OpenAPI document accordingly.

Happy to jump on a quick call to discuss as well.

faysal-ishtiaq commented 1 year ago

@tomkralidis I am attaching the json schemas for CreateObservationfinal and UpdateObservationfinal

CreateObservationfinal.schema()
{
    "title": "CreateObservationfinal",
    "type": "object",
    "properties": {
        "recorded_from": {
            "title": "Recorded From",
            "type": "string"
        },
        "described_by": {
            "title": "Described By",
            "type": "integer"
        },
        "obs_datetime": {
            "title": "Obs Datetime",
            "type": "string",
            "format": "date-time"
        },
        "obs_level": {
            "title": "Obs Level",
            "type": "string"
        },
        "obs_value": {
            "title": "Obs Value",
            "type": "number"
        },
        "flag": {
            "title": "Flag",
            "type": "string"
        },
        "period": {
            "title": "Period",
            "type": "integer"
        },
        "qc_status": {
            "title": "Qc Status",
            "type": "integer"
        },
        "qc_type_log": {
            "title": "Qc Type Log",
            "type": "string"
        },
        "acquisition_type": {
            "title": "Acquisition Type",
            "type": "integer"
        },
        "data_form": {
            "title": "Data Form",
            "type": "string"
        },
        "captured_by": {
            "title": "Captured By",
            "type": "string"
        },
        "mark": {
            "title": "Mark",
            "type": "integer"
        },
        "temperature_units": {
            "title": "Temperature Units",
            "type": "string"
        },
        "precipitation_units": {
            "title": "Precipitation Units",
            "type": "string"
        },
        "cloud_height_units": {
            "title": "Cloud Height Units",
            "type": "string"
        },
        "vis_units": {
            "title": "Vis Units",
            "type": "string"
        },
        "data_source_timezone": {
            "title": "Data Source Timezone",
            "type": "integer"
        }
    },
    "required": [
        "recorded_from",
        "described_by",
        "obs_datetime"
    ]
}
UpdateObservationfinal.schema()
{
    "title": "UpdateObservationfinal",
    "type": "object",
    "properties": {
        "obs_level": {
            "title": "Obs Level",
            "type": "string"
        },
        "obs_value": {
            "title": "Obs Value",
            "type": "number"
        },
        "flag": {
            "title": "Flag",
            "type": "string"
        },
        "period": {
            "title": "Period",
            "type": "integer"
        },
        "qc_status": {
            "title": "Qc Status",
            "type": "integer"
        },
        "qc_type_log": {
            "title": "Qc Type Log",
            "type": "string"
        },
        "acquisition_type": {
            "title": "Acquisition Type",
            "type": "integer"
        },
        "data_form": {
            "title": "Data Form",
            "type": "string"
        },
        "captured_by": {
            "title": "Captured By",
            "type": "string"
        },
        "mark": {
            "title": "Mark",
            "type": "integer"
        },
        "temperature_units": {
            "title": "Temperature Units",
            "type": "string"
        },
        "precipitation_units": {
            "title": "Precipitation Units",
            "type": "string"
        },
        "cloud_height_units": {
            "title": "Cloud Height Units",
            "type": "string"
        },
        "vis_units": {
            "title": "Vis Units",
            "type": "string"
        },
        "data_source_timezone": {
            "title": "Data Source Timezone",
            "type": "integer"
        }
    }
}
tomkralidis commented 1 year ago

Thanks. The method return signature has changed. So you need to return a tuple of the media type, and the schema as a dict:

return ('application/json', { ...dict of JSON schema... })
faysal-ishtiaq commented 1 year ago

Thanks. The method return signature has changed. So you need to return a tuple of the media type, and the schema as a dict:

return ('application/json', { ...dict of JSON schema... })

I just changed the return type as well... and it's working.. thank you :)

faysal-ishtiaq commented 1 year ago

It looks more like an OpenAPI issue than a pygeoapi issue. But I think it's a good place to discuss and get some clarification.

In our climsoft pygeoapi provider (that we are currently working on), we have some schemas within schemas, such as -


class Station(BaseModel):
    stationId: Optional[str]
    stationName: Optional[str]
    country: Optional[str]
    adminRegion: Optional[str]
    latitude: Optional[float]
    longitude: Optional[float]
    elevation: Optional[float]
    openingDatetime: Optional[str]
    closingDatetime: Optional[str]

    class Config:
        orm_mode = True
        allow_population_by_field_name = True
        fields = field_mapping

class Observationfinal(BaseModel):
    recordedFrom: str
    describedBy: Optional[int]
    obsDatetime: Optional[datetime.datetime]
    obsLevel: Optional[str]
    obsValue: Optional[float]
    flag: Optional[str]
    period: Optional[int]
    qcStatus: Optional[int]
    qcTypeLog: Optional[str]
    acquisitionType: Optional[int]
    dataForm: Optional[str]
    capturedBy: Optional[str]
    mark: Optional[int]
    temperatureUnits: Optional[str]
    precipitationUnits: Optional[str]
    cloudHeightUnits: Optional[str]
    visUnits: Optional[str]
    dataSourceTimeZone: Optional[int]

    station: Optional[Station]   # REFERS TO Station SCHEMA DEFINED ABOVE

    class Config:
        orm_mode = True
        allow_population_by_field_name = True
        fields = field_mapping

class ObservationfinalPygeoapiSchema(BaseModel):
    type: str = "Feature"
    geometry: GeometrySchema
    properties: Observationfinal
    id: str
    links: List[Dict[str, str]]

when we generate JSON schema from this, it looks like

{
    "title": "ObservationfinalPygeoapiSchema",
    "type": "object",
    "properties": {
        "type": {
            "title": "Type",
            "default": "Feature",
            "type": "string"
        },
        "geometry": {
            "$ref": "#/definitions/GeometrySchema"
        },
        "properties": {
            "$ref": "#/definitions/Observationfinal"
        },
        "id": {
            "title": "Id",
            "type": "string"
        },
        "links": {
            "title": "Links",
            "type": "array",
            "items": {
                "type": "object",
                "additionalProperties": {
                    "type": "string"
                }
            }
        }
    },
    "required": [
        "geometry",
        "properties",
        "id",
        "links"
    ],
    "definitions": {
        "GeometrySchema": {
            "title": "GeometrySchema",
            "type": "object",
            "properties": {
                "type": {
                    "title": "Type",
                    "default": "Point",
                    "type": "string"
                },
                "coordinates": {
                    "title": "Coordinates",
                    "type": "array",
                    "items": {
                        "type": "number"
                    }
                }
            },
            "required": [
                "coordinates"
            ]
        },
        "Station": {
            "title": "Station",
            "type": "object",
            "properties": {
                "station_id": {
                    "title": "Station Id",
                    "type": "string"
                },
                "name": {
                    "title": "Name",
                    "type": "string"
                },
                "country": {
                    "title": "Country",
                    "type": "string"
                },
                "region": {
                    "title": "Region",
                    "type": "string"
                },
                "latitude": {
                    "title": "Latitude",
                    "type": "number"
                },
                "longitude": {
                    "title": "Longitude",
                    "type": "number"
                },
                "elevation": {
                    "title": "Elevation",
                    "type": "number"
                },
                "start_datetime": {
                    "title": "Start Datetime",
                    "type": "string"
                },
                "end_datetime": {
                    "title": "End Datetime",
                    "type": "string"
                }
            }
        },
        "Observationfinal": {
            "title": "Observationfinal",
            "type": "object",
            "properties": {
                "recorded_from": {
                    "title": "Recorded From",
                    "type": "string"
                },
                "described_by": {
                    "title": "Described By",
                    "type": "integer"
                },
                "obs_datetime": {
                    "title": "Obs Datetime",
                    "type": "string",
                    "format": "date-time"
                },
                "obs_level": {
                    "title": "Obs Level",
                    "type": "string"
                },
                "obs_value": {
                    "title": "Obs Value",
                    "type": "number"
                },
                "flag": {
                    "title": "Flag",
                    "type": "string"
                },
                "period": {
                    "title": "Period",
                    "type": "integer"
                },
                "qc_status": {
                    "title": "Qc Status",
                    "type": "integer"
                },
                "qc_type_log": {
                    "title": "Qc Type Log",
                    "type": "string"
                },
                "acquisition_type": {
                    "title": "Acquisition Type",
                    "type": "integer"
                },
                "data_form": {
                    "title": "Data Form",
                    "type": "string"
                },
                "captured_by": {
                    "title": "Captured By",
                    "type": "string"
                },
                "mark": {
                    "title": "Mark",
                    "type": "integer"
                },
                "temperature_units": {
                    "title": "Temperature Units",
                    "type": "string"
                },
                "precipitation_units": {
                    "title": "Precipitation Units",
                    "type": "string"
                },
                "cloud_height_units": {
                    "title": "Cloud Height Units",
                    "type": "string"
                },
                "vis_units": {
                    "title": "Vis Units",
                    "type": "string"
                },
                "data_source_timezone": {
                    "title": "Data Source Timezone",
                    "type": "integer"
                },
                "station": {
                    "$ref": "#/definitions/Station"
                }
            },
            "required": [
                "recorded_from"
            ]
        }
    }
}

which is absolutely fine. And when we generate pygeoapi OpenAPI config file, I am not sure if it works as expected. I am pasting an example here for reference:

Click here to see the whole config file

    get:
      description: Climsoft observations
      operationId: getClimsoftFeature
      parameters:
      - $ref: https://schemas.opengis.net/ogcapi/features/part1/1.0/openapi/ogcapi-features-1.yaml#/components/parameters/featureId
      - $ref: '#/components/parameters/f'
      - $ref: '#/components/parameters/lang'
      responses:
        '200':
          content:
            application/json:
              schema: &id001
                definitions:
                  GeometrySchema:
                    properties:
                      coordinates:
                        items:
                          type: number
                        title: Coordinates
                        type: array
                      type:
                        default: Point
                        title: Type
                        type: string
                    required:
                    - coordinates
                    title: GeometrySchema
                    type: object
                  Observationfinal:
                    properties:
                      acquisition_type:
                        title: Acquisition Type
                        type: integer
                      captured_by:
                        title: Captured By
                        type: string
                      cloud_height_units:
                        title: Cloud Height Units
                        type: string
                      data_form:
                        title: Data Form
                        type: string
                      data_source_timezone:
                        title: Data Source Timezone
                        type: integer
                      described_by:
                        title: Described By
                        type: integer
                      flag:
                        title: Flag
                        type: string
                      mark:
                        title: Mark
                        type: integer
                      obs_datetime:
                        format: date-time
                        title: Obs Datetime
                        type: string
                      obs_level:
                        title: Obs Level
                        type: string
                      obs_value:
                        title: Obs Value
                        type: number
                      period:
                        title: Period
                        type: integer
                      precipitation_units:
                        title: Precipitation Units
                        type: string
                      qc_status:
                        title: Qc Status
                        type: integer
                      qc_type_log:
                        title: Qc Type Log
                        type: string
                      recorded_from:
                        title: Recorded From
                        type: string
                      station:
                        $ref: '#/definitions/Station'
                      temperature_units:
                        title: Temperature Units
                        type: string
                      vis_units:
                        title: Vis Units
                        type: string
                    required:
                    - recorded_from
                    title: Observationfinal
                    type: object
                  Station:
                    properties:
                      country:
                        title: Country
                        type: string
                      elevation:
                        title: Elevation
                        type: number
                      end_datetime:
                        title: End Datetime
                        type: string
                      latitude:
                        title: Latitude
                        type: number
                      longitude:
                        title: Longitude
                        type: number
                      name:
                        title: Name
                        type: string
                      region:
                        title: Region
                        type: string
                      start_datetime:
                        title: Start Datetime
                        type: string
                      station_id:
                        title: Station Id
                        type: string
                    title: Station
                    type: object
                properties:
                  geometry:
                    $ref: '#/definitions/GeometrySchema'
                  id:
                    title: Id
                    type: string
                  links:
                    items:
                      additionalProperties:
                        type: string
                      type: object
                    title: Links
                    type: array
                  properties:
                    $ref: '#/definitions/Observationfinal'
                  type:
                    default: Feature
                    title: Type
                    type: string
                required:
                - geometry
                - properties
                - id
                - links
                title: ObservationfinalPygeoapiSchema
                type: object

This results in some issue in the Swagger UI. I am attaching a few screenshots

Please notice the error messages image

notice the JSON schema image

To me, it looks like pygeoapi cannot handle the references properly at the moment.

tomkralidis commented 1 year ago

Correct, the custom #/definitions won't be recognized by pygeoapi.

Here, I would suggest you add a post-step (i.e. AFTER pygeoapi openapi generate $PYGEOAPI_CONFIG -of $PYGEOAPI_OPENAPI) that opens $PYGEOAPI_OPENAPI (i.e.yaml.load), injects the relevant snippets into definitions and save the file back (i.e. yaml.dump).

tomkralidis commented 1 year ago

@faysal-ishtiaq @iedwards any update? Is the previous approach something feasible from an opencdms viewpoint?

faysal-ishtiaq commented 1 year ago

@faysal-ishtiaq @iedwards any update? Is the previous approach something feasible from an opencdms viewpoint?

I wrote a script to post process open API config file. It's working so far.

tomkralidis commented 1 year ago

Thanks for the feedback @faysal-ishtiaq. As a result I've issued at PR at https://github.com/geopython/pygeoapi/pull/1022 for adding to pygeoapi proper.

tomkralidis commented 1 year ago

@faysal-ishtiaq @isedwards this is now merged in pygeoapi master branch.

isedwards commented 1 year ago

Perfect, thank you @tomkralidis.

The Climsoft CDMS Provider Plugin and OpenCDMS CLI update are also now merged on our side (resolved by#47). A second plugin is on the way soon for SURFACE CDMS.

See this section of README for details.