Open jvandegriff opened 5 years ago
Related: #59
Separate project - move to wiki and consider as separate effort. Could use as separate "in-between" server to do translation to other resolution, etc.
moving out of Milestone 3.0, since this should also be an extension to try it out and see if it could be captured in a generic way as something for all HAPI servers.
At the 2019-06-03 telecon, we talked about adding processing options in a generic way that clients could make use of. Here's a summary of that discussion.
Jeremy presented Das2 server options, where a dataset on a Das2 server can have processing flags. Each set of options is for an individual dataset. This practice grew out of the original use for Das2 servers, which was as a somewhat internal protocol between a client and server written by one developer, who understood what all the "secret" options were and could use them to optimize the data transfer for what the client needed. Jeremy advised against this kind of behind-the-scenes options proliferation.
However, some kind of way to allow configuration options for a dataset could be useful. There are at least two classes of processing options:
There are multiple benefits to supporting these kinds of options.
For the algorithms that are to be universal across all HAPI servers, they should be
The list of potential generic services is: binning, interpolation, spike removal. For binning, the simplest possible method would be: given a start time and a bin width, accumulate data points in each bin, and then divide by the number in each bin. There are options for how to handle empty bin: skip (con't include in output), use fill value for that bin, interpolate using one of a set of algorithms (might want to specify a maximum time width to allow interpolation above which FILL is inserted instead).
There are wording challenges here since interpolation to some does not necessarily mean overlaying data on to a regular grid. However, for our purposes, binning and interpolation do refer to a uniform grid, and re-sampling is used to indicate the capturing of points from one dataset at an arbitrary set of other time points that need not be uniformly spaced.
The capabilities mechanism needs to be expanded to allow server-wide and dataset-specific options to be described so that generic clients can detect that there are options and have a useful set of information that can be presented to users through clients so that the users can decide if and how to include any of the options.
Instrument teams tools usually include specialized flags that can be set when reading the data. Jeremy's example of "don't exclude instrumental spikes" is a great example, since the default behavior should be to remove instrumental spikes / glitches, but specialist users may want to see the spikes to make sure the spike removal algorithm is working (and has not excluded and real data that happens to be jumpy!)
For HAPI, all datasets should be scientifically optimal given the default set of options. Options can introduce or relax restrictions, binning, interpolation, reductions, different calibrations, spike removal, etc., and the inclusion of these options requires extra work by users to figure out and understand if they want the data modified in the ways advertised by the special options.