mendix / m2ee-tools

m2ee, the Mendix runtime helper tools for GNU/Linux
Other
27 stars 40 forks source link

PostgreSQL dumpdb/restoredb/emptydb and stored procedures/functions #43

Closed hvisage closed 5 years ago

hvisage commented 5 years ago

Goodday,

I'm in a situation (Mendix 6) where there have been a need to optimize the DB queries to prevent tight loops inside the modeller, and that is/was done using stored procedures.

Now a dumpDB stores those functions/procedures "correctly" The EmptyDB clears everything except these stored procedure/functions (which isn't quite Mendix's domain I'll concede) the restoreDB then fails 'cause these functions/procedures pre-exists, and gives an error (and my Ansible scripts fails).

I'm in need of the restoredb to include --clean --if-exists when it executes the pg_restore command.

Could we add these (will create a pull request if acceptable)? Perhaps a separate option/argument like --force (or part of --yolo ?) to enforce these clean if exists?

knorrie commented 5 years ago

Yes, and I'd rather add --clean --if-exists to the pg_dump command instead of restore.

hvisage commented 5 years ago

Hmmm… reading the pg_dump(1) man page, it states:

           This option is only meaningful for the plain-text format. For the archive formats, you can specify the option when you call pg_restore.

the backups made by m2ee dumpdb is custom format, not the plain-text

knorrie commented 5 years ago

Aha ok, yes, you're right. pg_restore it will be.

hvisage commented 5 years ago

44 created

knorrie commented 5 years ago

I don't think there's a need for --force or --yolo. The clean / if-exists on restore can only reduce the amount of explosion when a user tries to restoredb without doing emptydb first. Currently doing so will fail in spectacular ways.

But, maybe that's good... hm...

Clean / if-exists on restore will cover up a real problem that the user is causing. If I have an environment to just play around with things, I would be able to restore any database backup (from a very different application that previously been running in that database) and not get any errors, because it will replace some tables and lalalala. I will end up with a database that has many tables and sequences that are not relevant at all to my currently deployed application, but which are left-over cruft from the previous experiment... But... maybe I'm not working in a "just play around" environment, but I'm working in a production environment where I really expect all data that was in there to be gone when doing a restore.

Silently doing the wrong thing in that case is a no-go for this code base.

The Mendix Runtime uses tables and sequences. And that's what we throw away when the user asks to empty the database. Technically, the best thing would be to just drop database and create database, but there's a deliberate design choice in here to make it possible for the "unprivileged" database user to execute these actions.

So, another approach could be to add a check to see if there are no tables and sequences in the database at all before trying a restore, and afterwards still allow the if-exists/clean type of work to accomodate users who are adding extra things in their database. Since I know Mendix is not using anything else right now, this would do the right thing for the normal use case, and in case you're adding other things into the database, you'll be in a "you break it, you keep the pieces" situation when there are old stored procedures.

Another option is to also start explicitely removing stored procedures on emptydb, even when Mendix has to do nothing with them. But, how many other things can there be in a postgresql database that we would have to add over time to that list? It wouldn't get (regression)testing for sure, and I don't like that part...

Just thinking out loud here.

hvisage commented 5 years ago

@knorrie You've just given all the reasons to have it only done when specifying a --I-know-why-and-I-am-taking-responsibility-thus-force-this-clean-if-exist parameter option.

As such, emptydb doesn't clean all at present, only what is "knows" about, and that is not bad per se as there might be needs for data in non-public namespaces

My current issue is related to "acceptance/QA/dev refresh from Prod", so it's taking data back (where there might actually be some stuff "laying around" seeing it's not production), not a "going forward" or a true restore/disaster recovery procedure where things fails, as my next step would actually be to consider totally killing the database and re-creating it. (The devs might need to reconsider their development to have the stored procedures only created when they need it, and then to remove them when done, but they are not spending the costs today on that extra development, so they are making it an Ops problem which is cheaper than the Mendix developers)

The other question perhaps: have Mendix only do pg_dump of the actual Mendix used tables/procedures/etc., in other words that which an emptydb should've removed already, and nothing else. That would also bypass this issue... would actually force the devs to reconsider their approach w.r.t. the creation of the stored procedures in a "temporal" or "check it exists before use" manner. But even this might have some "other" stuff laying around.

The only other solution is to have emptyDB do a drop database mendixdb; create database mendixdb;, something that the mendix user might not have the permissions for. Perhaps this specific case I'll have to rather do that instead of an emptydb ?

knorrie commented 5 years ago

Yes, right. That's why thinking out loud is sometimes useful. :-)

So what about this:

So with --yolo you can also do fun stuff like restoring a dump which only contains a single table etc. And by default it would never silently do the unexpected thing, like I described with the prod db example.

Would that work for you?

hevisko commented 5 years ago

As I've this week discovered

mxnode:
  pg_restore: ...

Rather use that, and close this ticket