Create downsampling component

BrianJKoopman commented 5 years ago

Reading many files from disk in their entirety is slow. As discussed in #22 we should write some downsampled data to disk and record their locations in the SQL database. When a user queries for data we should return the 'best' resolution for their time range that results in a quick response while still representing the data well.

ahincks commented 5 years ago

Is this a job for sisock or should it be the job for another package?

If it is the job of sisock, then perhaps each data server should be responsible for creating these files for itself? What do you think?

ahincks commented 5 years ago

Having thought some more about Issue #29 and having worked at installing and understanding the HERA Librarian in some more detail, here's what I propose. It is, I believed, inspired by @mhasself's notion of "adiabatic" changes to code, in that it remains pretty backward compatible with what sisock, Librarian and so3g are currently doing.

First, create a stand-alone service (with its own repository), provisionally named here so_hk_watchdog, that does the following:

Watches for the appearance of new HK files and registers them in the Librarian DB, using the Librarian's programmatic python interface).
Records metadata about those files in the same DB.
This would be the information that is currently captured by the g3-file-scanner as well as information currently needed by so3g.hk.getdata module.
- To start with, this service could use DB tables that the Librarian doesn't know about but that relates to the primary keys of the entries that the Librarian has created. If it makes sense down the line, we could bring these tables under the purview of the Librarian: I can liaise with Paul (La Plant) on this point.
Create downsampled versions of registered HK files, and record pertinent information about them in the DB.

The so_hk_watchdog would run independently of sisock and the Librarian. (Of course, we could bundle it up with them with a single docker-compose if this seemed like a good way to go.)

Then, modify so3g.hk.getdata so that the HKArchive class is optionally able to access the DB populated by the watchdog. Based on the stride of data requested, it would know how to find the correct downsampled file and how to read it.

Perhaps the best thing to do is make HKArchive a base class, with one derived class that does exactly what is currently being done, and another derived class that knows how to get HK information from the DB.

Then, create a new data source component in sisock that interfaces with so3g.hk.getdata. That is, the solution to Issue #29 is not to modify g3_reader—we can keep that component in case people need a quick 'n' dirty way to display their HK data on a new install.

simonsobs / sisock

Create downsampling component #37