Open BrianJKoopman opened 5 years ago
Is this a job for sisock
or should it be the job for another package?
If it is the job of sisock
, then perhaps each data server should be responsible for creating these files for itself? What do you think?
Having thought some more about Issue #29 and having worked at installing and understanding the HERA Librarian in some more detail, here's what I propose. It is, I believed, inspired by @mhasself's notion of "adiabatic" changes to code, in that it remains pretty backward compatible with what sisock
, Librarian
and so3g
are currently doing.
First, create a stand-alone service (with its own repository), provisionally named here so_hk_watchdog
, that does the following:
Librarian
DB, using the Librarian
's programmatic python interface).g3-file-scanner
as well as information currently needed by so3g.hk.getdata
module.
Librarian
doesn't know about but that relates to the primary keys of the entries that the Librarian
has created. If it makes sense down the line, we could bring these tables under the purview of the Librarian
: I can liaise with Paul (La Plant) on this point.The so_hk_watchdog
would run independently of sisock
and the Librarian
. (Of course, we could bundle it up with them with a single docker-compose
if this seemed like a good way to go.)
Then, modify so3g.hk.getdata
so that the HKArchive
class is optionally able to access the DB populated by the watchdog. Based on the stride of data requested, it would know how to find the correct downsampled file and how to read it.
HKArchive
a base class, with one derived class that does exactly what is currently being done, and another derived class that knows how to get HK information from the DB.Then, create a new data source component in sisock
that interfaces with so3g.hk.getdata
. That is, the solution to Issue #29 is not to modify g3_reader
—we can keep that component in case people need a quick 'n' dirty way to display their HK data on a new install.
Reading many files from disk in their entirety is slow. As discussed in #22 we should write some downsampled data to disk and record their locations in the SQL database. When a user queries for data we should return the 'best' resolution for their time range that results in a quick response while still representing the data well.