perseas / Pyrseas

Provides utilities for Postgres database schema versioning.
https://perseas.github.io/
BSD 3-Clause "New" or "Revised" License
395 stars 67 forks source link

Comparing two YAML files #204

Closed dchang-dchang closed 5 years ago

dchang-dchang commented 5 years ago

Hi there,

First off, thanks - Pyrseas been really great to work with!

I'd love to diff two yaml files and generate the appropriate SQL using yamltodb. My use case is to run yamltodb as part of CI so that we can see what DB migrations are going to happen between the current and future yaml files.

I tried to feed a second yaml file through db.from_map(...), but it gave me enough errors to tell me that I'm probably doing something wrong...

Is running yamltodb with two yaml files possible?

Thanks!

jmafc commented 5 years ago

"As is" (or by design, if you will), it's not possible to compare two YAML files directly using yamltodb. An alternative (without changing Pyrseas code) to detecting possible migrations would be to use a regular diff command, either from the OS or git if the files are in a repository. Of course, that doesn't show you the SQL that would be generated. Another alternative, in a CI or development environment, is to use yamltodb -u using the "older" YAML file against a freshly created or empty database, and then run yamltodb using the second YAML file to generate the SQL. An attempt was made a while ago to implement a --revert option for yamltodb to generate SQL to rollback a set of changes (you can find traces of that, for example, in pyrseas/database.py, at around line 527 where we swap the internal representations (self.db and self.ndb). This also would've been useful for providing a YAML-to-YAML comparison. Unfortunately, we discovered the two representations aren't symmetrically consistent, particularly when dealing with ordering of columns in indexes, so that effort was stalled indefintely.

dchang-dchang commented 5 years ago

Thanks for the advice and pointers!

You're probably right that the best move is to point it at a new db - we set one up for integration tests anyways.

Because I was curious, I tried it anyways. Based on your hint to look at the revert option, I managed to get a POC working. It's not pretty and not something that would ever ship, but seems to work.

I'm going to see how this runs against more production test cases before I considering cleaning this up and adding the appropriate test cases. Anything obvious that I'm missing?

Sounds like this wouldn't be an interesting feature that you'd want to merge? Totally get it that it's not part of the project's vision.

Thanks again for all your hard work!


Here's the diff I ended up against, based on v0.8.0, optimized solely for smallest number of changes to get to POC:

diff --git i/pyrseas/database.py w/pyrseas/database.py
index 1675610..d45ea8d 100644
--- i/pyrseas/database.py
+++ w/pyrseas/database.py
@@ -208,8 +208,8 @@ class Database(object):

     def _link_refs(self, db):
         """Link related objects"""
-        langs = []
-        if self.dbconn.version >= 90100:
+        langs = [ "plpgsql", "pltcl", "pltclu", "plperl", "plperlu", "plpythonu", "plpython2u", "plpython3u"]
+        if self.dbconn and self.dbconn.version >= 90100:
             langs = [lang[0] for lang in self.dbconn.fetchall(
                 """SELECT lanname FROM pg_language l
                      JOIN pg_depend p ON (l.oid = p.objid)
@@ -492,6 +492,13 @@ class Database(object):

         return dbmap

+    def diff_two_map(self, a, b, quote_reserved=True):
+        self.dbconn = None
+        langs = [ "plpgsql", "pltcl", "pltclu", "plperl", "plperlu", "plpythonu", "plpython2u", "plpython3u"]
+        self.from_map(a, langs)
+        self.db = self.ndb
+        return self.diff_map(b, quote_reserved=quote_reserved)
+
     def diff_map(self, input_map, quote_reserved=True):
         """Generate SQL to transform an existing database

@@ -520,8 +527,10 @@ class Database(object):
         if quote_reserved:
             fetch_reserved_words(self.dbconn)

-        langs = [lang[0] for lang in self.dbconn.fetchall(
-            "SELECT tmplname FROM pg_pltemplate")]
+        langs = [ "plpgsql", "pltcl", "pltclu", "plperl", "plperlu", "plpythonu", "plpython2u", "plpython3u"]
+        if self.dbconn:
+            langs = [lang[0] for lang in self.dbconn.fetchall(
+                "SELECT tmplname FROM pg_pltemplate")]
         self.from_map(input_map, langs)
         if opts.revert:
             (self.db, self.ndb) = (self.ndb, self.db)
@@ -547,7 +556,10 @@ class Database(object):
             if old is not None:
                 stmts.append(old.alter(new))
             else:
-                stmts.append(new.create_sql(self.dbconn.version))
+                if self.dbconn: 
+                    stmts.append(new.create_sql(self.dbconn.version))
+                else:
+                    stmts.append(new.create_sql(90600))

                 # Check if the object just created was renamed, in which case
                 # don't try to delete the original one
diff --git i/pyrseas/yamltodb.py w/pyrseas/yamltodb.py
index 0c7105d..b455b99 100755
--- i/pyrseas/yamltodb.py
+++ w/pyrseas/yamltodb.py
@@ -22,6 +22,7 @@ def main():
                         "YAML-formatted file(s)", __version__)
     parser.add_argument('-m', '--multiple-files', action='store_true',
                         help='input from multiple files (metadata directory)')
+    parser.add_argument('--fromFile', type=FileType('r'), help='YAML specification')
     parser.add_argument('spec', nargs='?', type=FileType('r'),
                         default=sys.stdin, help='YAML specification')
     parser.add_argument('-1', '--single-transaction', action='store_true',
@@ -47,7 +48,9 @@ def main():
             print("Error is '%s'" % exc)
             return 1

-    stmts = db.diff_map(inmap)
+    outmap = yaml.safe_load(options.fromFile)
+
+    stmts = db.diff_two_map(inmap, outmap, quote_reserved=False)
     if stmts:
         fd = output or sys.stdout
         if options.onetrans or options.update:
jmafc commented 5 years ago

Well, I'm not doing much merging or maintenance on Pyrseas these days anyways (see https://pyrseas.wordpress.com/2018/09/12/the-future-of-pyrseas-revisited/). However, although this wasn't in the project's original vision, several others weren't either and they got worked on and merged. There is at least one group I know that is still using Pyrseas and added some functionality they needed (to a private fork). If this progressed beyond a PoC stage, they may be interested (but I'm speculating). OTOH, recently I was considering porting/rearchitecting Pyrseas using Go and it occurred to me that a lot of the SQL generation tests are poorly constructed: most of them are fed some Python dicts (emulating YAML) describing the current schema, then the tests connect to a test db, drop objects to make it essentially empty, create any objects needed by the test and then run diff_map to generate the SQL. It would be much easier to bypass the database altogether, and generate SQL from dummied up internal structures.

dchang-dchang commented 5 years ago

Ah! Totally missed that announcement - Thanks for all your hard work and I'm sure you're proud of all that you've done with this project!

FWIW I've enjoyed switching over to Go and totally agree that a yaml and yaml to sql is straightforward to unit test.