hapi-server / data-specification

HAPI Data Access Specification
https://hapi-server.org
22 stars 7 forks source link

how to extend the capabilities of a server? #59

Open jvandegriff opened 6 years ago

jvandegriff commented 6 years ago

As people build implementations, there is renewed interested in how to add non-standard features to the server.

Telecon notes from 2017-05-16 have a discussion about this: https://github.com/hapi-server/data-specification/wiki/telecon-notes

Everyone pretty much agrees that the HAPI endpoints should be kept strict (no extensions - any extra keywords is an error).

We would need a way for people to advertise extensions, presumably in the capabilities endpoints.

{
  "HAPI": "1.1",
  "status": {
    "code": 1200,  "message": "OK"
  },
  "outputFormats": [
    "csv", "binary", "json"
  ]
"extensions": {
    [
        {
            "endpoint": "averaged_data",   (name of endpoint)
            "description": "basic explanation of extension"
         },
         {
            "endpoint": "despiked_data",   (name of endpoint)
            "description": "basic explanation of extension"
          }
     ]
   }
}

The extension endpoint with no parameters IS a human readable landing page describing the parameters taken, and how to use the endpoint.

rweigel commented 5 years ago

There is an old wiki page that has some discussion of this: https://github.com/hapi-server/data-specification/wiki/extension-notes

rweigel commented 5 years ago

Here is an alternative that would not require the creation of new endpoints.

Example: A server provides time averaging. It can return the average, min, or max of parameters in non-overlapping windows with a given time width that can be 1 second, 1 minute, and 1 hour.

The request would have the form:

dataset=ID&parameters=IDs&start=START&stop=STOP&extension.window.width=PT1S&extension.window.operation=average

where the extension prefix communicates to someone who receives this link that they should not expect the URL parameter to work on any HAPI server.

Ideally, the server would also provide a schema so that a client can determine what the restrictions are on the extension parameters:

Capabilities response:

{
    "HAPI": "2.0",
    "status": { "code": 1200, "message": "OK"},
    "outputFormats": [ "csv", "binary", "json" ],
    "x_extensionSchema": {
        "window": {
            "type": "object",
            "items": {
                "required": ["width","operation"],
                "properties" {
                    "width": {
                        "description": "The width of the window to apply an operation."
                        "type": "string"
                        "enum": ["PT1S","PT1M","PT1H"]
                    }
                    "operation": {
                        "description": "The calculation to be performed in each time window."
                        "type": "string",
                        "enum": ["average","max","min"]
                    }
                }
            }
        }
    }
}   

This JSON schema can be used by a client who wishes to create a menu of options that can be selected for a request to the server.

jvandegriff commented 5 years ago

Many datasets have options, and current HAPI servers can only express these as a combinatorial explosion of datasets of slightly different names, each name representing one set of options. For basic differences like different time averages, it would be nice for clients to be able to recognize these relationships.

For the time averaging case, Autoplot can use this to request the correct resolution based on the duration of the time range requested.

The point here is that this linking of datasets of the the same data at different time resolutions could be done by offering a setting on the dataset, or by having different datasets that are linked via metadata (dataset X is the 5 minute resolution version of dataset Y, which has 1 hour resolution)

jvandegriff commented 5 years ago

At the 2019-07-15 telecon, it was pointed out that a. add extensions carefully and only after significant discussion since extensions can change or undercut the original goal of a simplistic, lowest level, easy-to-implement server b. extensions would always be optional b. people may end up extending it anyway, so some guidelines on how to do it could be useful d. extensions could help by allowing instrument teams or specific sub-communities to develop servers with lots of extra features, but that underneath, were fundamentally HAPI compliant (with good defaults automatically chosen for all the extra features)

As already discussed above in this issue, there are at least two kinds of (optional) extensions that could be added:

  1. new endpoints
  2. extra options to existing endpoints.

One approach (which involves not adding options) is to let any more complex processing (averaging, etc) be done by higher level services.

The effect of having many options when retrieving a dataset will make caching more difficult.

jvandegriff commented 3 years ago

related to issues #59 #70 #79 #78 #98 #100 #101 #102

All of these talk about extensions, or different REST-ful approaches that could advertise options or extensions.

jbfaden commented 3 years ago

"where" is another extension that comes up now and again. It would be nice to retrieve the Juno measurements only when RJ<10, for example, since this requires having the entire set available to extract the subset. (This is a good rule, where if an operator needs all the data is sent over either way, it shouldn't be a filter. For example, FFT Power shouldn't be done on the server side because we don't have reduction anyway.)

sandyfreelance commented 2 years ago

After some back and forth with Jeremy, I'm going with his/Bob's approach for SuperMAG of using the capabilities to handle the optional baseline and delta subtractions that the SuperMAG API provides (rather than having a proliferation of endpoints).

   "x_extensionSchema": {
        "baseline": {
            "description": "Changing the baseline subtraction from the default"\
,
            "type": "string",
            "enum": ["default","yearly","none"],
            "default": "default"
        },
        "delta": {
            "description": "Changing the delta subtraction from the default",
            "type": "string",
            "enum": ["default","none","start"],
            "default": "default"
        }
jbfaden commented 2 years ago

Sandy, I think this is a great example for proving extensions. I'll plan on supporting these in the Autoplot client.

rweigel commented 2 years ago

@sandyfreelance This looks a bit like a window operation. See above https://github.com/hapi-server/data-specification/issues/59#issuecomment-473437992

What about

{
    "HAPI": "2.0",
    "status": { "code": 1200, "message": "OK"},
    "outputFormats": [ "csv", "binary", "json" ],
    "x_extensionSchema": {
        "window": {
            "type": "object",
            "items": {
                "required": ["width","operation"],
                "properties" {
                    "width": {
                        "description": "The width of the window to apply an operation."
                        "type": "string"
                        "enum": ["P1D","P1Y"]
                    }
                    "operation": {
                        "description": "The calculation to be performed in each time window."
                        "type": "string",
                        "enum": ["subtractaverage"]
                    }
                }
            }
        }
    }
}

I don't think default and non are needed as they are implied when not given in the query. In the above, I've restricted the enums to the ones that you want to implement. If we decide to generalize this, servers can add to the enum list.

The calls would be

dataset=ID&parameters=IDs&start=START&stop=STOP&x_extension.window.width=P1D&x_extension.window.operation=subtractaverage
dataset=ID&parameters=IDs&start=START&stop=STOP&x_extension.window.width=P1Y&x_extension.window.operation=subtractaverage
sandyfreelance commented 2 years ago

I'm definitely using window as a template, but I don't think it falls under the 'window' designation, because it bulk-applies to all the data. Giving it its own 'x_extensionSchema' name is less likely to confuse and less likely to overwrite an existing 'window' usage.

Following the window logic, I could roll both into a single item instead of the current 2 items:

"x_extensionSchema": {
    "subtraction": {
        "type": "object",
        "items": {
           "required": ["baseline","delta"],
          "properties" {
[insert my JSON code inside this 'subtraction' object]
            }
       }
    }
}
rweigel commented 2 years ago

I see your point about "bulk applies" (if one of the parameters is a string, removing its daily average does not make sense). This is also something that we will need to address for all of the window filtering options I listed. . Given the many complications that this will introduce, I suppose it is better to just implement what you suggest for now just and then in the future we'll deal with how to incorporate baseline subtraction into the windowing options I listed.

jbfaden commented 2 years ago

I think we have to be careful when picking 'best' names for x... stuff. The whole point of the x... stuff is that it doesn't mess up the spec, so I think we should not care about potential name clashes. (Or are you guys working together on this feature?)

rweigel commented 2 years ago

I guess there are two types of x_ things.

  1. Ones we expect to eventually be a part of the spec and will probably implement in clients. We should get things right the first time so when x_ is removed, the clients require minimal updating.
  2. Ones we never expect to become a part of the spec.

This is a case 1 because it is a common window/filter operation.

I do see a new issue here. If we want access to SuperMag data through all of the clients, they'll all have to be updated to interpret this extension.

On Thu, Oct 14, 2021 at 6:10 PM Jeremy Faden @.***> wrote:

I think we have to be careful when picking 'best' names for x... stuff. The whole point of the x... stuff is that it doesn't mess up the spec, so I think we should not care about potential name clashes. (Or are you guys working together on this feature?)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hapi-server/data-specification/issues/59#issuecomment-943775477, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUQ57QMNBCRBHFUULG6UWTUG5IPDANCNFSM4D6LI3TQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

sandyfreelance commented 2 years ago

For SuperMAG these 2 are truly optional; a default client can serve the default data. Like any HAPI flags, power users have additional powers they can tap into, but the typical user has 100% access to the base data set. So I would see client update as a low priority, in such a way that we can think out how extensible we want capabilities to be. That said, I'd be happy to join discussions on how to deal with this at the client level (starting with the Python one).

jvandegriff commented 2 years ago

I think we need to tweak the JSON a little bit for the extensions. If we want to make a validator to see if the capabilities JSON response is valid, then the names of the additional parameters should not be the names of any JSON objects.

so instead of this:

{
  "x_extensionSchema": {
        "baseline": {
            "description": "Changing the baseline subtraction from the default",
            "type": "string",

put names of any new parameters as the value of a name field, like this:

{
  "x_extensionSchema": [
       {
           "name": "baseline",
           "description": "Changing the baseline subtraction from the default",
           "type": "string"
       }
  ]
}

Note that x_extensionSchema is now a list of objects, each of which has a specific JSON syntax that you can much more easily validate.

I also suggest having two lists, one for standard, but optional extensions, and one for custom extensions. I also think calling them extensions is too vague - they are additional request parameters, so if we just give them a long name, that is more self-documenting.

{
  "x_optionalRequestParameters": [
    {
      "name": "average",
      "type": "double",
      "description": "averaging window in seconds; windowing beings at start time; windows are inclusive on start time, exclusive on stop times",
      "min": 0
    },
    {
      "name" :"spikeFilter",
      "description":  "removes items more than N standard deviations from a running mean",
      "type" : "object",
      "objectSettings": {
       } 
    }
  ],
  "x_customRequestParameters": [
    {
      "name": "baseline",
      "type": "string",
      "enum": [ "default","yearly","none"]
    },
    {
       "name": "delta",
       "description": "Changing the delta subtraction from the default",
       "type": "string",
       "enum": ["default","none","start"],
       "default": "default"
     }
  ]
}

Note that some request parameters have constraints (averaging interval, or enumerated strings, or spike removal options, etc), so we need to figure out a way to capture those in a way that can also be easily validated against a JSON schema.

jbfaden commented 2 years ago

Here's the JSON schema: https://json-schema.org/understanding-json-schema/reference/numeric.html

sandyfreelance commented 2 years ago
    "x_customRequestParameters": [
        { "name": "supermag_baseline",
          "description": "Changing the baseline subtraction from the default",
          "type": "string",
          "enum": ["monthly","yearly","none"],
          "default": "monthly"  (avoid 'default=default' as a general rule)
        },
        {
          "name": "whatever",
          "type": "number",
          "mininum": 1,
          "exclusiveMaximum": 100,
          "default": 1
        },
        { "name": "quality",
          "description": "How good the data is",
          "type": "integer",
          "enum": [1,2,3],
          "default": 1
        },
        { "name": "supermag_delta",
          "description": "Changing the delta subtraction from the default",
          "type": "string",
          "enum": ["default","none","start"],
          "default": "default"
        }
    ],
rweigel commented 2 years ago

Misc links for future reference when we come up with a specification for custom parameters:

https://stackoverflow.com/questions/34735343/should-a-restful-api-have-a-schema

https://swagger.io/specification/

https://developer.wordpress.org/rest-api/extending-the-rest-api/schema/

jbfaden commented 2 years ago

This is specific to Sandy's server which allows additional options for each data set. We need to be careful with the name of this because I see we are already calling these "parameters." This is going to cause confusion with the "parameters" and "request parameters" terms. I'd suggest these be called "options" or something different.

sandyfreelance commented 2 years ago

x_CustomRequestOptions sounds better, I agree, and hammers home the "they're optional" aspect.

sandyfreelance commented 2 years ago

We'll put the constraints into their own wrapper. That said, question on implementation of constraints-- which format is better, default inside constraint or default outside constraint?

"default" inside the constraint spec:

{
    "x_customRequestParameters": {
        "name": "whatever",
        "type": "number",
        "constraint": {
            "minimum": 1,
            "exclusiveMaximum": 100,
            "default": 1
        }
    }
}

Or "default" outside the constraint spec:

{
    "x_customRequestParameters": {
        "name": "whatever",
        "type": "number",
        "constraint": {
            "minimum": 1,
            "exclusiveMaximum": 100
                 },
            "default": 1
    }
}

Here's the same choice, but for a string enum:

     "type": "string",
     "constraint": {
           "enum": ["monthly","yearly","none"],
           "default": "monthly"
     }

vs

    "type": "string",
    "constraint": {
          "enum": ["monthly","yearly","none"]
    },
   "default": "monthly"
jbfaden commented 2 years ago

Outside, definitely. I'd say this because the purpose of the constraints is to provide a GUI or guidance about the parameter, and to demonstrate that a parameter is acceptable. The default is used when the parameter is missing. I can imagine that constraints might be missing, but the default is required.

jvandegriff commented 2 years ago

I also think outside - you will always need a default but the constraint is optional.

I like having all the constraint related items in one object, especially if we plan to require that the constraints be expressed as valid JSON Schema - then all the elements in that constraint object are what you interpret with the JSON Schema parser.

jvandegriff commented 2 years ago

Just an FYI - here is what Bob posted for JSON Schema info:

https://json-schema.org/understanding-json-schema/reference/numeric.html

rweigel commented 2 years ago

If we have a constraint object, then expressing the contents of "x_customRequestParameters" in JSON Schema form will require deleting constraint and moving its contents up one level. (Also, if we did not have constraint, there would be no debate on where default goes.)

jbfaden commented 2 years ago

I'm confused about what would do for us. How much code does using an existing scheme actually save us? I would see its value in that we can use it as a guide for designing our scheme, as in knowing it should be "minimum" and not "minValue."

rweigel commented 2 years ago
  1. Someone can copy their x_customRequestParameters into an online JSON validator without modification to see if they made any errors. The HAPI validator can do the same.
  2. We would not need to write "We use JSON schema syntax except that we put stuff under constraint" and then answer the question of "why didn't you just follow JSON Schema; how much benefit is there to creating a small variation? How much was clarity improved?".
  3. We would not need to debate where default goes.
jbfaden commented 2 years ago

I think there's something I'm not familiar with here. What's weird to me is that a JSON schema describes a JSON document, but here the JSON schema is describing a set of parameters. Would you have the options string in the URL be a JSON document?

jvandegriff commented 2 years ago

I think I am also confused. Here's what I thought we were doing: The validation aspect applies to the value of the new parameter. This is not something that you can evaluate when looking at the hapi/capabilities response. I thought the JSON Schema parsing ability was going to be primed with info from the capabilities response, not used to determine if the capabilities content is valid. What you (the HAPI client) need is to initialize a JSON Schema-based mechanism that can determine if a user's value for a custom request parameter is valid (you make this determination after the user has entered their custom value and just before submitting the URL to the server). To initialize that object, you need only the contents of the constraint object. The other items (name, description, 'default`) are not part of JSON Schema validation, so they would presumably cause the JSON Schema mechanism to choke.

rweigel commented 2 years ago

There are two validations that must be performed.

  1. Is what the server sends out in x_customRequestParameters valid?
  2. Is what the client sent to the server in the request URL valid? The server uses the contents of x_customRequestParameters to answer this.

I now realize an additional benefit. There is a [https://json-schema.org/implementations.html#web-ui-generation](lot of code out there) that will take a JSON object that follows JSON schema and automatically generate a UI menu. So one can simply pass the contents of x_customRequestParameters to a library and it will create HTML menus.

jbfaden commented 2 years ago

That sounds really useful. That's the kind of stuff I've been trying to do all along, and it's not easy. If someone else has gone through the work and has a well-documented and working system, that's worth a lot.

jbfaden commented 2 years ago

Also, I don't know if the validator should look at any x_ parts. Is this what you mean by your first point?

sandyfreelance commented 2 years ago

So what are we converging at, what should our x_customRequestOptions json look like?

jbfaden commented 2 years ago

Bob's plan sounds good to me. I don't think it's much work to create a GUI from a spec, but if there's a system that is built already, we should use it.

jbfaden commented 2 years ago

This is curious, on https://json-editor.github.io/json-editor/: "favorite_color": { "type": "string", "format": "color", "title": "favorite color", "default": "#ffa500" },

So format="color" introduces a scheme for the string. Bob, do you know where these are enumerated?

sandyfreelance commented 2 years ago

The usual hex color table (#RRGGBB).

jbfaden commented 2 years ago

Besides format="color" I wonder what other formats there are.

jbfaden commented 2 years ago

Whoops, didn't mean to close.

sandyfreelance commented 2 years ago

I think he's getting color tables from the color picker widget and not enforcing them. He's not enforcing state two-digit names either. Likely from one of the external libraries being called(unfortunately 'choices' looks like a likely candidate but is a vague name to hunt it down from).

jbfaden commented 2 years ago

One issue to consider is if customRequestOptions changes the units of the data, we're going to have problems. In Sandy's case each option returns data in nT, but I can imagine an option to calibrate raw data, which might be flux or counts.

jbfaden commented 2 years ago

I ran across the project https://github.com/hapi-server/extensions, which I added (presumably after talking with others, hope so) to give a place where extensions would be described. We should move the discussion about the customRequestOptions to there. I've started a ticket: https://github.com/hapi-server/extensions/issues/1