perseas / Pyrseas

Provides utilities for Postgres database schema versioning.
https://perseas.github.io/
BSD 3-Clause "New" or "Revised" License
397 stars 67 forks source link

Assert fails with weird expression indexes #170

Open dvarrazzo opened 7 years ago

dvarrazzo commented 7 years ago

I have a table with integer fields x1_id and x2_id. There are indexes defined as:

"pairs_array_idx" gin ((ARRAY[x1_id, x2_id]))
"keys_idx" btree ((LEAST(x1_id, x2_id)), (GREATEST(x1_id, x2_id)))

dumping this database fails with an assert in index.py from catalog, because the expression of the first index is generated as something like (ARRAY[x1_id). If this is fixed by replacing:

--- a/pyrseas/dbobject/index.py
+++ b/pyrseas/dbobject/index.py
@@ -232,7 +232,7 @@ class IndexDict(DbObjectDict):
                 keyopts = []
                 extra = {}
                 if col == '0':
-                    expr = keyexprs[i]
+                    expr = ', '.join(keyexprs)
                     if rest and rest[0] == '(':
                         expr = '(' + expr + ')'
                     assert(rest.startswith(expr))

then the second index will fail because of some missing parens in the middle.

jmafc commented 7 years ago

In my refactoring of trigger.py, I got rid of calling a local _from_catalog(), and almost got rid of using pg_get_triggerdef() which later requires that the resulting text be parsed. Instead I'm getting most of the info directly from the pg_trigger catalog .

I would like to take a similar approach with index.py, i.e., get rid of the local _from_catalog() and using pg_get_indexdef() in the query, at least as much as possible.

jmafc commented 7 years ago

Although this isn't directly related to your problem, having completed the refactoring of pyrseas/dbobject/function.py, I wanted to mention that I now have a different opinion on overriding _from_catalog() in DbObjectDict descendants. If the descendant holds a single class, it should not be necessary to override. However, if the dict holds two or more classes, e.g., Function or Aggregate in the case of ProcDict, then my solution has been to split the queries, and override _from_catalog to fetch each set of objects. So I'm planning to do the same thing with ClassDict, which can hold Sequence, View, MaterializedView and of course Table. Furthermore, it seems that Index should also be held in ClassDict and that the keylist attribute of Index is incorrect: it should be ['schema', 'name'] just like other DbClass descendants. This change would have lots of ramifications, including prior-version compatibility since indexes are currently shown under an associated table in the YAML output.

jmafc commented 6 years ago

Retested this against the current master. If I create only the keys_idx index, no problem:

   indexes:
      keys_idx:
        keys:
        - (LEAST(x1_id, x2_id)):
            type: expression
        - (GREATEST(x1_id, x2_id)):
            type: expression

If I create only the pairs_array_idx, dbtoyaml fails as previously reported on the assert(rest.startswith(expr)). Your suggested workaround does fix that:

    indexes:
      pairs_array_idx:
        access_method: gin
        keys:
        - (ARRAY[x1_id, x2_id]):
            type: expression

But if both indexes are present, the "workaround" now causes the same assertion failure but on the second index. Bottom line: you can have one or the other but not both :-). Seriously, I'll take a deeper look.

jmafc commented 6 years ago

@dvarrazzo The last thing I was doing before I posted https://pyrseas.wordpress.com/2018/09/12/the-future-of-pyrseas-revisited/ was brushing up on regular expressions in order to tackle this and other issues labeled "regexp". I don't have an estimate on when this will happen, but based on other conversations I suspect that you have moved on.