Open jvandegriff opened 6 years ago
There is an old wiki page that has some discussion of this: https://github.com/hapi-server/data-specification/wiki/extension-notes
Here is an alternative that would not require the creation of new endpoints.
Example: A server provides time averaging. It can return the average, min, or max of parameters in non-overlapping windows with a given time width that can be 1 second, 1 minute, and 1 hour.
The request would have the form:
dataset=ID¶meters=IDs&start=START&stop=STOP&extension.window.width=PT1S&extension.window.operation=average
where the extension
prefix communicates to someone who receives this link that they should not expect the URL parameter to work on any HAPI server.
Ideally, the server would also provide a schema so that a client can determine what the restrictions are on the extension parameters:
Capabilities response:
{
"HAPI": "2.0",
"status": { "code": 1200, "message": "OK"},
"outputFormats": [ "csv", "binary", "json" ],
"x_extensionSchema": {
"window": {
"type": "object",
"items": {
"required": ["width","operation"],
"properties" {
"width": {
"description": "The width of the window to apply an operation."
"type": "string"
"enum": ["PT1S","PT1M","PT1H"]
}
"operation": {
"description": "The calculation to be performed in each time window."
"type": "string",
"enum": ["average","max","min"]
}
}
}
}
}
}
This JSON schema can be used by a client who wishes to create a menu of options that can be selected for a request to the server.
Many datasets have options, and current HAPI servers can only express these as a combinatorial explosion of datasets of slightly different names, each name representing one set of options. For basic differences like different time averages, it would be nice for clients to be able to recognize these relationships.
For the time averaging case, Autoplot can use this to request the correct resolution based on the duration of the time range requested.
The point here is that this linking of datasets of the the same data at different time resolutions could be done by offering a setting on the dataset, or by having different datasets that are linked via metadata (dataset X is the 5 minute resolution version of dataset Y, which has 1 hour resolution)
At the 2019-07-15 telecon, it was pointed out that a. add extensions carefully and only after significant discussion since extensions can change or undercut the original goal of a simplistic, lowest level, easy-to-implement server b. extensions would always be optional b. people may end up extending it anyway, so some guidelines on how to do it could be useful d. extensions could help by allowing instrument teams or specific sub-communities to develop servers with lots of extra features, but that underneath, were fundamentally HAPI compliant (with good defaults automatically chosen for all the extra features)
As already discussed above in this issue, there are at least two kinds of (optional) extensions that could be added:
One approach (which involves not adding options) is to let any more complex processing (averaging, etc) be done by higher level services.
The effect of having many options when retrieving a dataset will make caching more difficult.
related to issues #59 #70 #79 #78 #98 #100 #101 #102
All of these talk about extensions, or different REST-ful approaches that could advertise options or extensions.
"where" is another extension that comes up now and again. It would be nice to retrieve the Juno measurements only when RJ<10, for example, since this requires having the entire set available to extract the subset. (This is a good rule, where if an operator needs all the data is sent over either way, it shouldn't be a filter. For example, FFT Power shouldn't be done on the server side because we don't have reduction anyway.)
After some back and forth with Jeremy, I'm going with his/Bob's approach for SuperMAG of using the capabilities to handle the optional baseline and delta subtractions that the SuperMAG API provides (rather than having a proliferation of endpoints).
"x_extensionSchema": {
"baseline": {
"description": "Changing the baseline subtraction from the default"\
,
"type": "string",
"enum": ["default","yearly","none"],
"default": "default"
},
"delta": {
"description": "Changing the delta subtraction from the default",
"type": "string",
"enum": ["default","none","start"],
"default": "default"
}
Sandy, I think this is a great example for proving extensions. I'll plan on supporting these in the Autoplot client.
@sandyfreelance This looks a bit like a window operation. See above https://github.com/hapi-server/data-specification/issues/59#issuecomment-473437992
What about
{
"HAPI": "2.0",
"status": { "code": 1200, "message": "OK"},
"outputFormats": [ "csv", "binary", "json" ],
"x_extensionSchema": {
"window": {
"type": "object",
"items": {
"required": ["width","operation"],
"properties" {
"width": {
"description": "The width of the window to apply an operation."
"type": "string"
"enum": ["P1D","P1Y"]
}
"operation": {
"description": "The calculation to be performed in each time window."
"type": "string",
"enum": ["subtractaverage"]
}
}
}
}
}
}
I don't think default and non are needed as they are implied when not given in the query. In the above, I've restricted the enums to the ones that you want to implement. If we decide to generalize this, servers can add to the enum list.
The calls would be
dataset=ID¶meters=IDs&start=START&stop=STOP&x_extension.window.width=P1D&x_extension.window.operation=subtractaverage
dataset=ID¶meters=IDs&start=START&stop=STOP&x_extension.window.width=P1Y&x_extension.window.operation=subtractaverage
I'm definitely using window as a template, but I don't think it falls under the 'window' designation, because it bulk-applies to all the data. Giving it its own 'x_extensionSchema' name is less likely to confuse and less likely to overwrite an existing 'window' usage.
Following the window logic, I could roll both into a single item instead of the current 2 items:
"x_extensionSchema": {
"subtraction": {
"type": "object",
"items": {
"required": ["baseline","delta"],
"properties" {
[insert my JSON code inside this 'subtraction' object]
}
}
}
}
I see your point about "bulk applies" (if one of the parameters is a string, removing its daily average does not make sense). This is also something that we will need to address for all of the window filtering options I listed. . Given the many complications that this will introduce, I suppose it is better to just implement what you suggest for now just and then in the future we'll deal with how to incorporate baseline subtraction into the windowing options I listed.
I think we have to be careful when picking 'best' names for x... stuff. The whole point of the x... stuff is that it doesn't mess up the spec, so I think we should not care about potential name clashes. (Or are you guys working together on this feature?)
I guess there are two types of x_ things.
This is a case 1 because it is a common window/filter operation.
I do see a new issue here. If we want access to SuperMag data through all of the clients, they'll all have to be updated to interpret this extension.
On Thu, Oct 14, 2021 at 6:10 PM Jeremy Faden @.***> wrote:
I think we have to be careful when picking 'best' names for x... stuff. The whole point of the x... stuff is that it doesn't mess up the spec, so I think we should not care about potential name clashes. (Or are you guys working together on this feature?)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hapi-server/data-specification/issues/59#issuecomment-943775477, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUQ57QMNBCRBHFUULG6UWTUG5IPDANCNFSM4D6LI3TQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
For SuperMAG these 2 are truly optional; a default client can serve the default data. Like any HAPI flags, power users have additional powers they can tap into, but the typical user has 100% access to the base data set. So I would see client update as a low priority, in such a way that we can think out how extensible we want capabilities to be. That said, I'd be happy to join discussions on how to deal with this at the client level (starting with the Python one).
I think we need to tweak the JSON a little bit for the extensions. If we want to make a validator to see if the capabilities
JSON response is valid, then the names of the additional parameters should not be the names of any JSON objects.
so instead of this:
{
"x_extensionSchema": {
"baseline": {
"description": "Changing the baseline subtraction from the default",
"type": "string",
put names of any new parameters as the value of a name
field, like this:
{
"x_extensionSchema": [
{
"name": "baseline",
"description": "Changing the baseline subtraction from the default",
"type": "string"
}
]
}
Note that x_extensionSchema
is now a list of objects, each of which has a specific JSON syntax that you can much more easily validate.
I also suggest having two lists, one for standard, but optional extensions, and one for custom extensions. I also think calling them extensions is too vague - they are additional request parameters, so if we just give them a long name, that is more self-documenting.
{
"x_optionalRequestParameters": [
{
"name": "average",
"type": "double",
"description": "averaging window in seconds; windowing beings at start time; windows are inclusive on start time, exclusive on stop times",
"min": 0
},
{
"name" :"spikeFilter",
"description": "removes items more than N standard deviations from a running mean",
"type" : "object",
"objectSettings": {
}
}
],
"x_customRequestParameters": [
{
"name": "baseline",
"type": "string",
"enum": [ "default","yearly","none"]
},
{
"name": "delta",
"description": "Changing the delta subtraction from the default",
"type": "string",
"enum": ["default","none","start"],
"default": "default"
}
]
}
Note that some request parameters have constraints (averaging interval, or enumerated strings, or spike removal options, etc), so we need to figure out a way to capture those in a way that can also be easily validated against a JSON schema.
Here's the JSON schema: https://json-schema.org/understanding-json-schema/reference/numeric.html
"x_customRequestParameters": [
{ "name": "supermag_baseline",
"description": "Changing the baseline subtraction from the default",
"type": "string",
"enum": ["monthly","yearly","none"],
"default": "monthly" (avoid 'default=default' as a general rule)
},
{
"name": "whatever",
"type": "number",
"mininum": 1,
"exclusiveMaximum": 100,
"default": 1
},
{ "name": "quality",
"description": "How good the data is",
"type": "integer",
"enum": [1,2,3],
"default": 1
},
{ "name": "supermag_delta",
"description": "Changing the delta subtraction from the default",
"type": "string",
"enum": ["default","none","start"],
"default": "default"
}
],
Misc links for future reference when we come up with a specification for custom parameters:
https://stackoverflow.com/questions/34735343/should-a-restful-api-have-a-schema
https://swagger.io/specification/
https://developer.wordpress.org/rest-api/extending-the-rest-api/schema/
This is specific to Sandy's server which allows additional options for each data set. We need to be careful with the name of this because I see we are already calling these "parameters." This is going to cause confusion with the "parameters" and "request parameters" terms. I'd suggest these be called "options" or something different.
x_CustomRequestOptions sounds better, I agree, and hammers home the "they're optional" aspect.
We'll put the constraints into their own wrapper. That said, question on implementation of constraints-- which format is better, default inside constraint or default outside constraint?
"default" inside the constraint spec:
{
"x_customRequestParameters": {
"name": "whatever",
"type": "number",
"constraint": {
"minimum": 1,
"exclusiveMaximum": 100,
"default": 1
}
}
}
Or "default" outside the constraint spec:
{
"x_customRequestParameters": {
"name": "whatever",
"type": "number",
"constraint": {
"minimum": 1,
"exclusiveMaximum": 100
},
"default": 1
}
}
Here's the same choice, but for a string enum:
"type": "string",
"constraint": {
"enum": ["monthly","yearly","none"],
"default": "monthly"
}
vs
"type": "string",
"constraint": {
"enum": ["monthly","yearly","none"]
},
"default": "monthly"
Outside, definitely. I'd say this because the purpose of the constraints is to provide a GUI or guidance about the parameter, and to demonstrate that a parameter is acceptable. The default is used when the parameter is missing. I can imagine that constraints might be missing, but the default is required.
I also think outside - you will always need a default
but the constraint is optional.
I like having all the constraint related items in one object, especially if we plan to require that the constraints be expressed as valid JSON Schema - then all the elements in that constraint
object are what you interpret with the JSON Schema parser.
Just an FYI - here is what Bob posted for JSON Schema info:
https://json-schema.org/understanding-json-schema/reference/numeric.html
If we have a constraint
object, then expressing the contents of "x_customRequestParameters"
in JSON Schema form will require deleting constraint
and moving its contents up one level. (Also, if we did not have constraint
, there would be no debate on where default
goes.)
I'm confused about what would do for us. How much code does using an existing scheme actually save us? I would see its value in that we can use it as a guide for designing our scheme, as in knowing it should be "minimum" and not "minValue."
x_customRequestParameters
into an online JSON validator without modification to see if they made any errors. The HAPI validator can do the same.default
goes.I think there's something I'm not familiar with here. What's weird to me is that a JSON schema describes a JSON document, but here the JSON schema is describing a set of parameters. Would you have the options string in the URL be a JSON document?
I think I am also confused. Here's what I thought we were doing:
The validation aspect applies to the value of the new parameter. This is not something that you can evaluate when looking at the hapi/capabilities response. I thought the JSON Schema parsing ability was going to be primed with info from the capabilities
response, not used to determine if the capabilities
content is valid. What you (the HAPI client) need is to initialize a JSON Schema-based mechanism that can determine if a user's value for a custom request parameter is valid (you make this determination after the user has entered their custom value and just before submitting the URL to the server). To initialize that object, you need only the contents of the constraint
object. The other items (name
, description
, 'default`) are not part of JSON Schema validation, so they would presumably cause the JSON Schema mechanism to choke.
There are two validations that must be performed.
x_customRequestParameters
valid?x_customRequestParameters
to answer this.I now realize an additional benefit. There is a [https://json-schema.org/implementations.html#web-ui-generation](lot of code out there) that will take a JSON object that follows JSON schema and automatically generate a UI menu. So one can simply pass the contents of x_customRequestParameters
to a library and it will create HTML menus.
That sounds really useful. That's the kind of stuff I've been trying to do all along, and it's not easy. If someone else has gone through the work and has a well-documented and working system, that's worth a lot.
Also, I don't know if the validator should look at any x_ parts. Is this what you mean by your first point?
So what are we converging at, what should our x_customRequestOptions json look like?
Bob's plan sounds good to me. I don't think it's much work to create a GUI from a spec, but if there's a system that is built already, we should use it.
This is curious, on https://json-editor.github.io/json-editor/: "favorite_color": { "type": "string", "format": "color", "title": "favorite color", "default": "#ffa500" },
So format="color" introduces a scheme for the string. Bob, do you know where these are enumerated?
The usual hex color table (#RRGGBB).
Besides format="color" I wonder what other formats there are.
Whoops, didn't mean to close.
I think he's getting color tables from the color picker widget and not enforcing them. He's not enforcing state two-digit names either. Likely from one of the external libraries being called(unfortunately 'choices' looks like a likely candidate but is a vague name to hunt it down from).
One issue to consider is if customRequestOptions changes the units of the data, we're going to have problems. In Sandy's case each option returns data in nT, but I can imagine an option to calibrate raw data, which might be flux or counts.
I ran across the project https://github.com/hapi-server/extensions, which I added (presumably after talking with others, hope so) to give a place where extensions would be described. We should move the discussion about the customRequestOptions to there. I've started a ticket: https://github.com/hapi-server/extensions/issues/1
As people build implementations, there is renewed interested in how to add non-standard features to the server.
Telecon notes from 2017-05-16 have a discussion about this: https://github.com/hapi-server/data-specification/wiki/telecon-notes
Everyone pretty much agrees that the HAPI endpoints should be kept strict (no extensions - any extra keywords is an error).
We would need a way for people to advertise extensions, presumably in the capabilities endpoints.
The extension endpoint with no parameters IS a human readable landing page describing the parameters taken, and how to use the endpoint.