Closed jvandegriff closed 1 year ago
Yes. See above.
What to use to label the extra string metadata?
Candidates are:
stringInfo
stringType
stringConstraints
stringAdvice
stringUsageHints
stringProperties
stringUsage
: { "uri": { "mediaType": "image/fits", "scheme": "https"} }
stringSubType
externalType
stringSchema
We don't want to consider enums just yet (it's a different set of concerns):
"stringType": "enum"
"stringType" : { "enum": { "values": ["low", "med", high"] } }
(emphasizing: not using this)
Not specifying any stringType
means there's no special interpretation of the string values.
For now, the mediaType
and scheme
are optional.
Also, mediaType
is singular (not a list), so only one media type for a given URI parameter.
"stringType": "uri"
"stringType": { "uri": { "mediaType": "image/fits", "scheme": "https"} }
The type in this case is URI and the properties of that type are the media type and scheme. For now, we only have one such type, namely, the URI types.
We still need to emphasize that simply listing files does not fulfill data intensive intent of HAPI.
I would like to say this about string
parameters that are of "stringType": "uri"
:
The units for a string parameter that is a URI should be null
. The units value here should not be used to try and describe the contents behind the URIs. URI content is too variable to be uniformly handled by this simple units indicator.
Do people agree?
Agreed.
It is time for discussions with the SunPy group, the people at the ISWAT meeting who caused this trend, and the NGDC folks.
One other thought. Some datasets have a parameter that is a local time (e.g, HH:MM). Or a parameter could be a non-ISO date but a commonly used date format (e.g., DD/MM/YY). Would a string type of nonISODateTime
with constraints of DD/MM/YY
be appropriate? I've also seen tables where the position is reported, as 40° 31' 21"
or (more commonly right ascension HH:MM:SS
).
We also had that local time parameter in the SSCWeb HAPI Server. We should put in a "x_" type description to play with this. Actually that's interesting. How would one do this? "x_stringType": { "type":"decomposedTime", "components":"$H:$M" } ?
Strings can hold a wide variety of things, so if you want something in a string to instead be numeric data, the current answer from the HAPI spec is that the server should do that conversion and present the numbers that clients are supposed to use. Servers should present data in one of HAPI's very simple types (integer, double, time, string). If we implement a generic string interpretation mechanisms that could reduce pressure to keep data delivery simple.
But ... if you we want to go forward with this, here are some thoughts: A string interpretation mechanism would allow servers more flexibility to serve the potentially messy data that the server can't really change. And then the HAPI metadata offers clients instructions on how to interpret strings. So this makes servers simple, while adding complexity to the metadata (ok, that's still a server thing) and the clients.
So what about having stringType
options that cover the other numeric types, and the stringType
object provides a parsing string to be able to extract the parts you want. This gets tricky, since you have to have a whole parsing framework, so we should think about the goals for that, and adopt one that's already there. Probably just use regular expressions.
Data: time, status 2023-03-03T10:25:19.0Z, complete: 92% 2023-03-03T10:25:20.0Z, complete: 93% 2023-03-03T10:25:21.0Z, complete: 94% 2023-03-03T10:25:22.0Z, complete: 95% 2023-03-03T10:25:23.0Z, complete: 96%
"stringType": { "integer": { "parser":"[a-z]+: (\d+)", "value": "$1"}}
The parser is a regular expression (where \d represents [0-9]), with Group 1 being the integer value to be extracted.
For the HH:MM:SS case, this is really just an angle, so you want to parse it to be a longitude so you can use it as a vector component. This would involve computations on the extracted elements, so that gets tricky if you wanted to do it generically. But the above mechanism would work for this too: time, MLT 2023-03-03T10:25:19.0Z, 18:41:22 2023-03-03T10:25:19.0Z, 18:41:23 2023-03-03T10:25:19.0Z, 18:41:24
"stringType": { "double": { "parser":"(\d+):(\d+):(\d+)", "value": "$1+$2/60+$3/3600"}}
Then you could use this parameter really as the longitude that it represents.
"parameters": [
{ "name": "Time", "type": "isoTime", "length": 26, "fill": null},
{ "name": "position", "description": "radial position",
"coordinateSystemName": "GSE",
"vectorComponents": "r",
"type": "double",
"fill": NaN
},
{ "name": "magnetic_local_time", "description": "MLT in HH:MM:SS",
"coordinateSystemName": "GSE",
"vectorComponents": "longitude",
"type": "string",
"fill": "--:--:--",
"stringType": { "double": { "parser":"(\d+):(\d+):(\d+)", "value": "$1+$2/60+$3/3600"}}
},
]
For times, we would need to define our own mechanism, so that seems complicated.
Another thing that would be interesting for URIs would be to indicate that they follow our URI Template scheme.
This is a lot of complexity though, and probably belongs in the semantic layer above HAPI that keeps coming up.
We need to move these other options to a new ticket.
Pull request https://github.com/hapi-server/data-specification/pull/166 created for adding stringType
for URI support. Please review!
If I say that my type is:
"stringType": { "x_double": { "parser":"(\d+):(\d+):(\d+)", "value": "$1+$2/60+$3/3600"} }
is that the same as "stringType":null to clients not supporting xdouble? (That would be all but one client, presumably.) I would really like for people to use x extensions, so that we don't need to agree on these things, and useful patterns will emerge.
What's the use of having the ISO standard if we allow servers to supply date/times in non-ISO formats? I also don't really see a use case for these string parsers for what is essentially numerical data, and hope they can be discouraged. I would prefer a local time to be always presented as a floating point number, in units of hours, or as an angle in units of radians or degrees from local midnight. That makes it the most straightforward for use in various kinds of plots, for example also when binning as a function of local time. Then if the client software wants to be able to present local time in a nicer format to users, like HH:MM, it is not so difficult to do that conversion in the client as a last step, after all the manipulations such as binning, have taken place. In principle, the client has easy access to OS locale information and user preferences for such conversions, while the server does not.
I stood up an example of what this feature might look like, see:
https://cottagesystems.com/whapi/hapi/info?id=creek_pics
Note that I took the liberty of adding "base" to the "x_stringType", after arguing a week or two ago that these should be full URIs. I think it will happen often that there's a "base" and it makes the stream more readable.
@eelcodoornbos - I agree that we should be discouraged. The use case is when the data provider does not want to change their parameters to floats, etc. We don't want to prevent them from serving their data through HAPI because they don't want to change.
We will want to consider adding suggestions for how this is likely going to be used.
For example, consider a dataset with numeric content, let's give it a dataset id of myData
. And then a server also has a file listing capability for this dataset with an id of myData-listing
. Do we want to make recommendations on how to link these?
There is issue #78 that has thoughts on how to relate datasets, so we will need to implement something like that soon.
In the context of what we are doing here, our UTC
type is a string that has a certain type.
Also, timeStampLocation
is something set at the data set level, and it applies to the time column only. But it is something that could apply to other columns. In this case, it is additional stringInfo
that only applies to a certain string type.
I don't know if this will change anything, but we should discuss it.
Bob, I don't follow what you are referring to. My assumption any field can be type isotime (UTC).
Instead of having the types UTC
, string
, integer
, double
, we could have had only string
, integer
and double
.
Then we would indicate that certain strings are to be interpreted as UTC
in a similar way that we are proposing that certain strings are to be interpreted as URI
s. So for consistency, we should have types of
URI
, UTC
, string
, integer
, double
or types of
string
, integer
, double
and indicate, if appropriate, when a string parameter is a UTC
or URI
using something like stringType
or stringInfo
, etc.
I doubt it will be worth getting exact consistency, but it is worth discussing.
I was having that same thought, but I think isotimes are so central to what we do that they deserve a special type. However if we introduce a new type for URI, I can think of a number of different types we might want to introduce (time ranges, nominal data, etc), and I don't think we want to go down that route. Having stringType just allows us to describe something without requiring that the description be understood by the client.
Time is so fundamental, it's OK to have it be it's own one of the 4 fundamental types:
isoTime
, integer
, double
, string
If HAPI expands to other usages (like with OPeNDAP), we might need other types, like short int, long int, float, unsigned long, etc.
These type-related issues belong on a different ticket.
Currently it is potentially problematic that the timeStampLocation
is a once-per-dataset setting, and since other time columns are allowed, you can't apply a different timeStampLocation
to any non-primary time column. But could be dealt with on a different issue.
closed by PR #166
This came up in the ISWAT meeting 2021-03-17 so I'll add it here to trigger a discussion at a future meeting. People were asking if HAPI could server a series of image data (i.e., FITS images of the sun) in a similar way as is done for other multi-dimensional time series data.
The VSO already supports image search and retrieval, and HAPI would just be the retrieval part. The image data is indexed by many other things besides time, so the concept of a dataset would need to be clear about which images are included.
Seems like you would need a separate interface to find the time ranges of the images that meet your spatial or other search criteria. We would also need to preserve all the FITS keywords.
Related to preserving FITS keywords, there was also a question about how to have HAPI still transmit ISTP metadata, much of which is now lost when going through HAPI. The model-to-data-comparison folks tend to use that ISTP content. So I'll create a separate ticket to talk about preserving a dataset's more rich metadata.