simonsobs / sisock

Sisock ('saɪsɒk): streaming of Simons Obs. data over websockets for quicklook
Other
2 stars 0 forks source link

Handling Differences between the sisock and so3g HKArchiveScanner APIs #45

Open BrianJKoopman opened 4 years ago

BrianJKoopman commented 4 years ago

While working on a patch to use the new so3g HKArchiveScanner to open data within sisock's g3-reader I ran into a difference between the two APIs which is causing some trouble in sisock.

With the way sisock's get_data and get_fields methods are designed one needs to cache the result of get_fields in order to get the mapping of timelines to field data. This implies the timeline_name cannot change between each function call. The HKArchiveScanner differs in that a list of the fields that belong to the timeline are also returned within the timeline dictionary returned by get_data. If get_data is called for a specific subset of fields, this almost guarantees the set of timelines returned do not match the set returned by get_fields.

If we're going to drop in the HKArchiveScanner to the sisock g3-reader data server, then it would make sense to return the data and timeline dictionaries as specified in the so3g version of the API. The proposal is that we should modify the sisock API to match the functionality such that you don't always have to call (and cache) the results of get_fields in order to get the mapping.

The only place this comes up (outside of potential user code) is in the grafana-http bridge, as that is currently caching the results of a get_fields call to parse the later get_data calls. This is currently incompatible with the HKArchiveScanner. A check for the presence of the fields key within the timeline dictionary could be made before processing, to ensure backwards compatibility with the old API, but if we do think we should make the changes here, then this should eventually be phased out.

To demonstrate, here's some of the example in so3g's getdata.py:

    hkcs = HKArchiveScanner()
    for filename in sys.argv[1:]:
        hkcs.process_file(filename)
    cat = hkcs.finalize()
    # Get list of fields, timelines, spanning all times:
    fields, timelines = cat.get_fields()

The timelines returned are given arbitrary group names:

>>> timelines.keys()
dict_keys(['group0', 'group1', 'group2', 'group3', 'group4', 'group5'])

Based on the fields list we can make a call to get_data:

f, t = cat.get_data([field_name], short_match=True)

The results of which, since we only have a single associated timeline, only have one group, group0:

>>> t.keys()
dict_keys(['group0'])

If the field you are calling for didn't happen to be group0 in the initial get_fields call, then sisock will fail with a KeyError when trying to parse the results.

This has implications in that each data server would need to be updated to build the fields list into their get_data timeline dictionary.

ahincks commented 4 years ago

I agree with the proposal to adopt the (superior) API of so3g.