Open jbfaden opened 6 years ago
My current "data inventory" information is simply in the info response like this example
"x_spdf_inventory": { "intervals": [ { "begin": "1996-03-20T00:00:06Z", "end": "1996-03-21T21:06:30Z" }, { "begin": "1996-03-22T00:00:09Z", "end": "1996-03-27T21:08:49Z" },...
But I could re-implement it as your
"x_availability":"https://cdaweb.gsfc.nasa.gov/dataviews/sp_phys/datasets/po_h0_hyd/inventory?format=hapi"
proposal. There are of course a lot of caveats that go with the information I'm returning.
I guess my link to the whole info response got lost above. Here's another attempt https://www.dropbox.com/s/zf806oufxx6wqs5/po_h0_hyd_hapi_info.json?dl=0
On the 2018-06-18 telecon, we decided to try this approach: add an optional capability called 'availability' using the capabilities endpoint.
Just referencing the bare endpoint should result in a list of dataset IDs for which availability info can be obtained. These IDs should match the IDs in the catalog endpoint.
If you give an ID to the availability endpoint, then it will return a list of time ranges in the following format: column 1: start time column 2: stop time column 3: 0 for no data in this interval, 1 for data in this interval columns beyond this are optional and can contain other user-specific data, such as what fraction of the interval is filled with data, or a label for the time interval. It was decided not to attempt to regularize any of this, sine it really opens up a can of worms trying to figure out how to specify event list type of info, and there are already standards for that.
The availability info format is going to be kept very simple, and HAPI-centric, and will not be made available as other event list formats. Converters to those formats would be simple, and could be included in clients.
I would prefer a JSON response like this
$ curl -s "http://localhost:8084/WS/hapi/availability?id=AC_H3_MFI" | python -mjson.tool { "HAPI": "2.1", "availability": [ { "available": 1, "startDate": "1998-01-01T00:00:00Z", "stopDate": "2008-10-25T23:59:59Z" }, { "available": 1, "startDate": "2009-01-01T00:00:00Z", "stopDate": "2018-03-28T23:59:59Z" } ], "creationDate": "2018-06-20T11:28:20.839Z", "status": { "code": 1200, "message": "OK" } }
One issue to consider with the "0/1" to indicate availability is redundancy. We could communicate everything about availability with only two columns
/hapi/availability -> list of data set IDs with availability information (same schema as /catalog; endpoint optional)
The minimal requirement for the endpoint would be to indicate intervals where a user can expect to get at least one data record if they used the listed start/stops in a data request.
/hapi/availability?id=ABC
start1,stop1 start2,stop2 start3,stop3
A potential problem with allowing additional columns that are not regularized is how we would communicate what the columns mean. We would need something like
/hapi/availablility/info
which is not ideal.
I think I favor a simpler approach here like Bob is suggesting. Just a list of intervals where data is present, so-called Good Time Intervals (GTIs).
If we thought it was useful, we could support with an optional third column indicating the fraction of the interval that is filled, but there are lots of potential ways you could calculate this, so we would have to be specific about this.
I think this sort of services is needed. However, I'd (very strongly) prefer that it was a different API because:
So I propose that we develop a new API standard named FLAPI
(File Listing API). It would use parts of the HAPI metadata specification and some of the existing HAPI software could be used.
See als #116
[adding email discussion, since it's relevant.]
Hi Bob and Jon,
Could you have a look at
https://cottagesystems.com/server/esac/hapi/info?id=C4_CP_CIS-CODIF_HS_O1_PEF/availability
and let me know if you find this data scheme agreeable? This is what I was thinking we should use for availability. Autoplot would detect that it starts with two isotimes and display the data as an events bar. (I feel fairly strongly that this should be the scheme. While it's tempting to do something like time and then length for the second column, having two times makes it very clear what the intended use is.) Autoplot doesn't do anything special with this yet, but I plan on adding it in. (See the image below, which is just the number of records vs start time.)
Jeremy
[Bob's response.] We should probably discuss this on the telecon; this is going to take some thought to get right. I think we should come up with a schema for this as you suggest, so that people use the same parameter names and syntax for the id.
You may want to look at Bernie's CDAS rest server, which has /inventory and /orig_data endpoints.
There was a brief discussion about how availability might be done with HAPI servers. The gist of the conclusion is that the info would contain a reference to another dataset which describes the availability of the first data set. I'll make a proposal for how this would be done beyond that. The info request for "http://hapi-server.org/hapi/info?id=0B000800408DD710" might return:
"x_availability":"availability/0B000800408DD710"
which would be a dataset id on the same server which should have four columns: time,endTime,code,message
where code would be either "200" or "204". Note there is no requirement of endTime, other than it would be later than time. 200 indicates data will be found in this interval, and 204 (empty response) indicates no data will be found in the interval. When only 200 is returned, one may assume that the opposite intervals do not have data. When only 204 is returned, one may assume that the opposite intervals do have data. I don't believe clients would necessarily be bound to being overly precise. For example, if 90% of a day contains data, and 10% is a typical missing-data rate, then the entire day could be included in the present list.
Note: existing clients are supported because this is just another dataset.
I would also suggest that availability could refer to another server, like so:
"x_availability":"http://hapi-server.org/availability/hapi?id=0B000800408DD710"
because this would allow a server to be indexed externally.