wfau / ScienceArchives

0 stars 0 forks source link

Experiment with Postgres RDMS implementation #558

Open Ross-ROE opened 4 months ago

Ross-ROE commented 4 months ago

Follows issue #557

On new VM install Postgres software and maybe create a dummy example database

Ross-ROE commented 3 months ago

Postgres could presumably be installed via the OS's package manager if we think that's best - though sudo access is required. We'll then need to create a common file system structure that is presumably read/write accessible by all users on which to host a test database. We'll also need to set up access to Postgres RDMS via a common database account system like the old ldservrw account we had before.

Ross-ROE commented 3 months ago

Suggestion in #557 is to host the database on a common file system with this structure:

/moons-archive/database/

astrodb commented 3 months ago

Which do yo prefer, having sudo access to do this all yourself? Or having me perform the basic install ad you setup the config from there?

I've attached a 1TB volume to the VM and mounted it at /moons-archive. I created the /moons-archive/database/ directory and gave ownership to 'rsc', though this should be changed to the database system account once that's known.

zexpe commented 3 months ago

Cool, thanks. When we know what we are doing I shouldn't ever need sudo access. So I don't mind not having it, if you are OK taking requests whilst we get the initial setup settled. I don't think there's much to do once we've set-up the basic file system and installed Python & Postgres. The only concern would be the common Python environment and the shared libraries there - I can produce a list now, but we might want some more as development progresses...

Ross-ROE commented 17 hours ago

Rob has installed Postgres, and we are arranging permissions on an individual username basis.

Ross-ROE commented 16 hours ago

Having discovered that Postgres doesn't support cross-database querying without using extensions with complicated syntax and dubious performance (dblink or postgres_fdw), it would seem the Postgres-way of dealing with multiple databases on the same server is to have them as separate schemas in in the same database:

https://www.postgresql.org/docs/current/ddl-schemas.html#DDL-SCHEMAS-CREATE

Perhaps we should create a WFAU SA database, and then create the MOONS SA database as Schema within that instance. Then if we wanted to cross-query MOONS data with other databases then they would need to also be ingested into the WFAU SA as their own schemas? Inevitably, we'd have to come up with some solution to enable interoperability between legacy databases and next gen databases, and this may as good as any solution. Considering we're moving into the virtual hardware world of Somerville there's less to be gained through dedicated servers per database anyway. Though having the VVV and MOONS in the same database entity may present performance issues, and loading duplicate copies of the VVV will of course be both massively time and space consuming...