Open lpiep opened 3 years ago
Ok so as far as I understand this, it seems like there are three tasks: 1) figuring out a definition file that reproduces the versions of the mesa3 operating system rmariadb db as well as any non-default options/configurations we use for our database 2) Figuring out a way to host rmariadb in a singularity image that gets around the immutability of images. This would either mean using some of singularity's build-in options for this (either sandboxing or a persistent overlay) or binding the writable location of the databse. A persistent overlay is probably the most appropriate option here--iirc we'll need to ask IT to install the tools necessary to create the writable filesystem that singularity uses with persistent overlays. 3) Code to create the tables and transfer the contents or a subset of the contents of mesa_load into the database in the image. I'd recommend against having actual data in the image or hard-coding the structure of the databse into the image since it wouldn't be guaranteed to reflect the live structure/contents of mesa_load. A separate repo containing code for setting up the structure of the database and transfering data might be a good option here, but that assumes we can work out the permissions allowing the databse to be modified post-build.
Since this is a lot of steps, it's best to start with the first one which is the simplest. I'd recommend that you play around with writing a definition file (building using the sylabs remote builder) until you're able to install rmariadb (or perhaps identify a pre-build docker image with our exact rmariadb version and OS...which seems optimistic). Once you have a working definition file I can upload it here and host the container and we can start working on step 2.
The idea is for an image that would be versioned along with our database revs that would allow for safely developing code that interacts with the MESA database.
This might contain:
geocode_lookup_tbl
table orstandard_aqs_monitors
table)The same could be done for the PostgreSQL geodatabase.
The reason for doing this rather than working in the mesa_load schema is that that schema tends to contain data that is in use or would be time-consuming to reproduce. If the new script I'm working on accidentally wipes mesa_load, we may have lost considerable work put into updating AQS data, loading new cohorts, etc. Having a true development database would allow us to use the mesa_load schema as a place to put only final changes ready for the next rev, rather than as a scratch space.