hapi-server / data-specification

HAPI Data Access Specification
https://hapi-server.org
22 stars 7 forks source link

can HAPI also serve a time series of FITS images or other remote sensing data? #116

Closed jvandegriff closed 1 year ago

jvandegriff commented 3 years ago

This came up in the ISWAT meeting 2021-03-17 so I'll add it here to trigger a discussion at a future meeting. People were asking if HAPI could server a series of image data (i.e., FITS images of the sun) in a similar way as is done for other multi-dimensional time series data.

The VSO already supports image search and retrieval, and HAPI would just be the retrieval part. The image data is indexed by many other things besides time, so the concept of a dataset would need to be clear about which images are included.

Seems like you would need a separate interface to find the time ranges of the images that meet your spatial or other search criteria. We would also need to preserve all the FITS keywords.

Related to preserving FITS keywords, there was also a question about how to have HAPI still transmit ISTP metadata, much of which is now lost when going through HAPI. The model-to-data-comparison folks tend to use that ISTP content. So I'll create a separate ticket to talk about preserving a dataset's more rich metadata.

rweigel commented 1 year ago

Yes. See above.

jvandegriff commented 1 year ago

What to use to label the extra string metadata?

Candidates are: stringInfo stringType stringConstraints stringAdvice stringUsageHints stringProperties stringUsage : { "uri": { "mediaType": "image/fits", "scheme": "https"} } stringSubType externalType stringSchema

We don't want to consider enums just yet (it's a different set of concerns):

"stringType": "enum"
"stringType" : { "enum": { "values": ["low", "med", high"] } }

(emphasizing: not using this)

Not specifying any stringType means there's no special interpretation of the string values. For now, the mediaType and scheme are optional. Also, mediaType is singular (not a list), so only one media type for a given URI parameter.

"stringType": "uri"
"stringType": { "uri": { "mediaType": "image/fits", "scheme": "https"} }

The type in this case is URI and the properties of that type are the media type and scheme. For now, we only have one such type, namely, the URI types.

We still need to emphasize that simply listing files does not fulfill data intensive intent of HAPI.

jvandegriff commented 1 year ago

I would like to say this about string parameters that are of "stringType": "uri":

The units for a string parameter that is a URI should be null. The units value here should not be used to try and describe the contents behind the URIs. URI content is too variable to be uniformly handled by this simple units indicator.

Do people agree?

rweigel commented 1 year ago

Agreed.

It is time for discussions with the SunPy group, the people at the ISWAT meeting who caused this trend, and the NGDC folks.

One other thought. Some datasets have a parameter that is a local time (e.g, HH:MM). Or a parameter could be a non-ISO date but a commonly used date format (e.g., DD/MM/YY). Would a string type of nonISODateTime with constraints of DD/MM/YY be appropriate? I've also seen tables where the position is reported, as 40° 31' 21" or (more commonly right ascension HH:MM:SS).

jbfaden commented 1 year ago

We also had that local time parameter in the SSCWeb HAPI Server. We should put in a "x_" type description to play with this. Actually that's interesting. How would one do this? "x_stringType": { "type":"decomposedTime", "components":"$H:$M" } ?

jvandegriff commented 1 year ago

Strings can hold a wide variety of things, so if you want something in a string to instead be numeric data, the current answer from the HAPI spec is that the server should do that conversion and present the numbers that clients are supposed to use. Servers should present data in one of HAPI's very simple types (integer, double, time, string). If we implement a generic string interpretation mechanisms that could reduce pressure to keep data delivery simple.

But ... if you we want to go forward with this, here are some thoughts: A string interpretation mechanism would allow servers more flexibility to serve the potentially messy data that the server can't really change. And then the HAPI metadata offers clients instructions on how to interpret strings. So this makes servers simple, while adding complexity to the metadata (ok, that's still a server thing) and the clients.

So what about having stringType options that cover the other numeric types, and the stringType object provides a parsing string to be able to extract the parts you want. This gets tricky, since you have to have a whole parsing framework, so we should think about the goals for that, and adopt one that's already there. Probably just use regular expressions.

Data: time, status 2023-03-03T10:25:19.0Z, complete: 92% 2023-03-03T10:25:20.0Z, complete: 93% 2023-03-03T10:25:21.0Z, complete: 94% 2023-03-03T10:25:22.0Z, complete: 95% 2023-03-03T10:25:23.0Z, complete: 96%

"stringType": { "integer": { "parser":"[a-z]+: (\d+)", "value": "$1"}}

The parser is a regular expression (where \d represents [0-9]), with Group 1 being the integer value to be extracted.

For the HH:MM:SS case, this is really just an angle, so you want to parse it to be a longitude so you can use it as a vector component. This would involve computations on the extracted elements, so that gets tricky if you wanted to do it generically. But the above mechanism would work for this too: time, MLT 2023-03-03T10:25:19.0Z, 18:41:22 2023-03-03T10:25:19.0Z, 18:41:23 2023-03-03T10:25:19.0Z, 18:41:24

"stringType": { "double": { "parser":"(\d+):(\d+):(\d+)", "value": "$1+$2/60+$3/3600"}}

Then you could use this parameter really as the longitude that it represents.

"parameters": [
   { "name": "Time", "type": "isoTime", "length": 26, "fill": null},
   { "name": "position", "description": "radial position", 
      "coordinateSystemName": "GSE",
      "vectorComponents": "r",
      "type": "double",
      "fill": NaN
   },
   { "name": "magnetic_local_time", "description": "MLT in HH:MM:SS", 
      "coordinateSystemName": "GSE",
      "vectorComponents": "longitude",
      "type": "string",
       "fill": "--:--:--",
       "stringType": { "double": { "parser":"(\d+):(\d+):(\d+)", "value": "$1+$2/60+$3/3600"}}
    },

]

For times, we would need to define our own mechanism, so that seems complicated.

Another thing that would be interesting for URIs would be to indicate that they follow our URI Template scheme.

This is a lot of complexity though, and probably belongs in the semantic layer above HAPI that keeps coming up.

jvandegriff commented 1 year ago

We need to move these other options to a new ticket.

Pull request https://github.com/hapi-server/data-specification/pull/166 created for adding stringType for URI support. Please review!

jbfaden commented 1 year ago

If I say that my type is:

 "stringType": { "x_double": { "parser":"(\d+):(\d+):(\d+)", "value": "$1+$2/60+$3/3600"} } 

is that the same as "stringType":null to clients not supporting xdouble? (That would be all but one client, presumably.) I would really like for people to use x extensions, so that we don't need to agree on these things, and useful patterns will emerge.

eelcodoornbos commented 1 year ago

What's the use of having the ISO standard if we allow servers to supply date/times in non-ISO formats? I also don't really see a use case for these string parsers for what is essentially numerical data, and hope they can be discouraged. I would prefer a local time to be always presented as a floating point number, in units of hours, or as an angle in units of radians or degrees from local midnight. That makes it the most straightforward for use in various kinds of plots, for example also when binning as a function of local time. Then if the client software wants to be able to present local time in a nicer format to users, like HH:MM, it is not so difficult to do that conversion in the client as a last step, after all the manipulations such as binning, have taken place. In principle, the client has easy access to OS locale information and user preferences for such conversions, while the server does not.

jbfaden commented 1 year ago

I stood up an example of what this feature might look like, see:

https://cottagesystems.com/whapi/hapi/info?id=creek_pics

Note that I took the liberty of adding "base" to the "x_stringType", after arguing a week or two ago that these should be full URIs. I think it will happen often that there's a "base" and it makes the stream more readable.

rweigel commented 1 year ago

@eelcodoornbos - I agree that we should be discouraged. The use case is when the data provider does not want to change their parameters to floats, etc. We don't want to prevent them from serving their data through HAPI because they don't want to change.

jvandegriff commented 1 year ago

We will want to consider adding suggestions for how this is likely going to be used.

For example, consider a dataset with numeric content, let's give it a dataset id of myData. And then a server also has a file listing capability for this dataset with an id of myData-listing. Do we want to make recommendations on how to link these?

There is issue #78 that has thoughts on how to relate datasets, so we will need to implement something like that soon.

rweigel commented 1 year ago

In the context of what we are doing here, our UTC type is a string that has a certain type.

Also, timeStampLocation is something set at the data set level, and it applies to the time column only. But it is something that could apply to other columns. In this case, it is additional stringInfo that only applies to a certain string type.

I don't know if this will change anything, but we should discuss it.

jbfaden commented 1 year ago

Bob, I don't follow what you are referring to. My assumption any field can be type isotime (UTC).

rweigel commented 1 year ago

Instead of having the types UTC, string, integer, double, we could have had only string, integer and double.

Then we would indicate that certain strings are to be interpreted as UTC in a similar way that we are proposing that certain strings are to be interpreted as URIs. So for consistency, we should have types of

URI, UTC, string, integer, double

or types of

string, integer, double

and indicate, if appropriate, when a string parameter is a UTC or URI using something like stringType or stringInfo, etc.

I doubt it will be worth getting exact consistency, but it is worth discussing.

jbfaden commented 1 year ago

I was having that same thought, but I think isotimes are so central to what we do that they deserve a special type. However if we introduce a new type for URI, I can think of a number of different types we might want to introduce (time ranges, nominal data, etc), and I don't think we want to go down that route. Having stringType just allows us to describe something without requiring that the description be understood by the client.

jvandegriff commented 1 year ago

Time is so fundamental, it's OK to have it be it's own one of the 4 fundamental types: isoTime, integer, double, string

If HAPI expands to other usages (like with OPeNDAP), we might need other types, like short int, long int, float, unsigned long, etc.

These type-related issues belong on a different ticket.

Currently it is potentially problematic that the timeStampLocation is a once-per-dataset setting, and since other time columns are allowed, you can't apply a different timeStampLocation to any non-primary time column. But could be dealt with on a different issue.

jvandegriff commented 1 year ago

closed by PR #166