spacepy / dbprocessing

Automated processing controller for heliophysics data
5 stars 4 forks source link

Create RBSP_MAGEIS.sqlite database #34

Closed dnadeau-lanl closed 2 years ago

dnadeau-lanl commented 3 years ago

RBSP_MAGEIS.sqlite is a binary file and we should be able to recreate it. we have new tables and new columns on some tables where tests are failing.

Proposed enhancement

Will create a "SQL" file to recreate RBSP_MAGEIS.sqlite if needed.

CREATE TABLE mission (
        mission_id INTEGER NOT NULL, 
        mission_name VARCHAR(20) NOT NULL, 
        rootdir VARCHAR(50) NOT NULL, 
        incoming_dir VARCHAR(50) NOT NULL, 
        PRIMARY KEY (mission_id), 
         UNIQUE (mission_name)
);

This issue should be closed when: tests pass with merging dbprocessing from LANL.

jtniehof commented 3 years ago

This is a tricky one. We actually have three .sqlite files used in the tests:

emptyDB.sqlite is just the output of CreateDB and we can easily have the tests create that on the fly. I was planning on doing that at some point.

testDB.sqlite is the "final state" of the functional tests. It's reproducible but not exactly easily (basically have to run the functional suite.)

RBSP_MAGEIS.sqlite is the (I think full) RBSP-ECT magEIS database that we're including in the tests specifically to make sure we don't break the ECT processing with changes. Having this as an "artifact" was important to that intention: even databases that were created with older code should keep working.

Longer-term I was sort of hoping to get to a single database to back up the unit tests (we still need to have some reasonable way of testing against older database structure, though.) I've been fairly slap-happy about when I use testDB, when RBSP_MAGEIS, and when emptyDB + adding columns to get to a desired start state.

Ideally the "start state" probably shouldn't depend on dbprocessing code, but have raw(ish) SQL commands that just load the data into the database. In the short term, it would probably be possible to set up something like this which would reproduce RBSP_MAGEIS in a fresh PostgreSQL database, which would at least make it possible to test #14. Would that help you along?

The additional tables/columns in the sdndev branch would be, I think, a separate but related issue--first we try to get the current master branch working in some way against PostgreSQL, then we converge the structures between the branches?

dnadeau-lanl commented 3 years ago

Actually I am trying to get sqlite working with sdndev. There are a lot of changes everywhere at LANL, I noticed that "yesterday/tomorrow" was deleted and the tests failed. I finally succeeded to put this back. As well, there are a lot of queries now about instrument_id, and the new procespidlink table here. I got lots of tests running at this time.

For this issue,

I was thinking of using pandas and maybe JSON or CSV.

cnx = create_engine('sqlite://///RBSP_MAGEIS.sqlite').connect()
tables_df = pd.read_sql("SELECT name FROM sqlite_master WHERE type='table' AND name NOT LIKE 'sqlite_%'", cnx)
frames_dict = {}

for table in tables_df['name']:
    print("reading tables ", table)
    frames_dict[table] = pd.read_sql_table(table, cnx)

for table in frames_dict.keys():
    result = frames_dict[table].to_json(orient='split')
    parsed = json.loads(result)
    print(json.dumps(parsed, indent=4))
jtniehof commented 3 years ago

Got it...so the concern would be being able to pour the contents in to slightly different database structures. I really like that. I'd love to be able to set up our unit tests in such a way that the full test suite can run on a few different databases behind it, various combinations of sqlite vs. PostgreSQL, database with/without the Unix time table or extra mission columns, etc.

I'd like to avoid pulling in a pandas dependency if we can, but if it's a short-term thing that gets us places that sounds fine. Otherwise I did some poking for SQL-to-json serialization:

dnadeau-lanl commented 3 years ago

My goal was to do something more like this:

https://www.sqlshack.com/introduction-to-sqlalchemy-in-pandas-dataframe/

Pandas will only be used for development and running unit tests/regression tests.