spacepy / dbprocessing

Automated processing controller for heliophysics data
5 stars 4 forks source link

Review database column types and sizes #107

Open jtniehof opened 2 years ago

jtniehof commented 2 years ago

Because sqlite doesn't really do any type enforcing, I've managed to be pretty lazy about column types. This is coming to bite us in the postgres support (#60). So this needs a scrub...both for what promises SQLAlchemy makes about types and what happens when they come into postgres. For instance the unix_time table apparently is rolling over around 2040, which is REALLY weird because the Integer is supposed to be machine native (and I hope nobody's running on 32-bit now), and the 32-bit rollover is in 2038 anyhow. There's also some cases where the space allocated for certain columns storing paths is too short, etc.

Closure condition

We need to check what type/size columns should be, implement that (including any relevant code updates or testing), and my usual bugaboo, migration support.

jtniehof commented 2 years ago

I've run into size limits in the tests with instrument name and process name, suggesting that they should be extended.

jtniehof commented 2 years ago

The release number is, oddly, a String(20). I have no idea how that happened. @balarsen, did we do this on purpose so it could be "1.2" or "MARCH2020" or was this an accident? I'm not sure which it should be. This seems to go back to 2012 where we removed the release number from the file table (where it was also a string, length 2, although it was marked as an int in diskfile).

balarsen commented 2 years ago

Phew, I have no recollection and didn't find any notes. we have been doing "rel04" kinda things. But no idea. No strong preference on int vs string vs whatever. I lean toward string but you are way more in the loop than me.

jtniehof commented 2 years ago

Can you check on your current releases if you're doing "rel04" or just "4"? I suspect we may want to rename the column if we keep it a string.

balarsen commented 2 years ago
sqlite> .schema release
CREATE TABLE release ( file_id INTEGER NOT NULL, release_num VARCHAR(20) NOT NULL, PRIMARY KEY (file_id, release_num), FOREIGN KEY(file_id) REFERENCES file (file_id) );

sqlite> select * from release;

So it appears moot as RBSP is not using it at all in the database. Change as you see fit.