perseas / Pyrseas

Provides utilities for Postgres database schema versioning.
https://perseas.github.io/
BSD 3-Clause "New" or "Revised" License
395 stars 67 forks source link

assert failed with some index definition #98

Closed dvarrazzo closed 10 years ago

dvarrazzo commented 10 years ago

dbtoyaml 0.7.1 fails to dump this table:

CREATE TABLE timesheet_holiday ( id serial PRIMARY KEY, name text NOT NULL, date date NOT NULL, recurring boolean NOT NULL );

CREATE UNIQUE INDEX unique_date ON timesheet_holiday USING btree ((CASE WHEN recurring THEN (0)::double precision ELSE date_part('year'::text, date) END), date_part('month'::text, date), date_part('day'::text, date));

dvarrazzo commented 10 years ago

Sorry, including the stacktrace could be useful too:

dbtoyaml pyrseas
Traceback (most recent call last):
  File "/usr/local/bin/dbtoyaml", line 9, in <module>
    load_entry_point('Pyrseas==0.7.1', 'console_scripts', 'dbtoyaml')()
  File "/usr/local/lib/python2.7/dist-packages/pyrseas/dbtoyaml.py", line 49, in main
    dbmap = db.to_map()
  File "/usr/local/lib/python2.7/dist-packages/pyrseas/database.py", line 263, in to_map
    self.from_catalog()
  File "/usr/local/lib/python2.7/dist-packages/pyrseas/database.py", line 170, in from_catalog
    self.db = self.Dicts(self.dbconn)
  File "/usr/local/lib/python2.7/dist-packages/pyrseas/database.py", line 95, in __init__
    self.indexes = IndexDict(dbconn)
  File "/usr/local/lib/python2.7/dist-packages/pyrseas/dbobject/__init__.py", line 471, in __init__
    self._from_catalog()
  File "/usr/local/lib/python2.7/dist-packages/pyrseas/dbobject/index.py", line 190, in _from_catalog
    assert(rest.startswith(expr))
AssertionError
jmafc commented 10 years ago

If you comment out the assert on which it fails (pyrseas/dbobject/index.py line 190), it doesn't give an error, but I'm afraid the YAML output isn't right. I thought I had seen complicated index expressions, but this beats them by two orders of magnitude.

dvarrazzo commented 10 years ago

I can't even vouch it's a reasonable index :) I've just found it in our live database, which definitely contains questionable constructs.

jmafc commented 10 years ago

Well, Postgres accepted it so that's all that matters from the perspective of dbtoyaml/pg_dump. However, I do note that since it's supposed to be a unique index and recurring is boolean, it appears there can be only one recurring true value in the table, which seems odd.

dvarrazzo commented 10 years ago

I don't want to defend that schema :) However what is says is that there can be all the different non-recurring holidays, but there can be only a recurring holiday per (month, day).

-- you can't have two recurring holidays on the same date, different years
pyrseas=# insert into timesheet_holiday (name, date, recurring) values ('xmas', '2013-12-25', true);
INSERT 0 1
pyrseas=# insert into timesheet_holiday (name, date, recurring) values ('nyeve', '2013-12-31', true);
INSERT 0 1
pyrseas=# insert into timesheet_holiday (name, date, recurring) values ('nyeve2', '2014-12-31', true);
ERROR:  duplicate key value violates unique constraint "unique_date"

-- but you can have non-recurring ones
pyrseas=# insert into timesheet_holiday (name, date, recurring) values ('bday1', '2013-7-7', false);
INSERT 0 1
pyrseas=# insert into timesheet_holiday (name, date, recurring) values ('bday2', '2014-7-7', false);
INSERT 0 1
pyrseas=# select * from timesheet_holiday;
 id | name  |    date    | recurring 
----+-------+------------+-----------
  1 | xmas  | 2013-12-25 | t
  2 | nyeve | 2013-12-31 | t
  4 | bday1 | 2013-07-07 | f
  5 | bday2 | 2014-07-07 | f
(4 rows)

It basically forces the uniqueness of the triple (Y, M, D) for non-recurring dates and of (0, M, D) for recurring ones.

As you say however this is not pyrseas concern :) I think it should just be more careful in parsing index expressions containing parens (which is still doable when the expression doesn't contain strings, but when it also contains a string and the string contain parens... well that's a further order of magnitude of mess!)

dvarrazzo commented 10 years ago

Or, alternatively, do you really need to split a multi-col index into its columns and store them separately in yaml?

jmafc commented 10 years ago

Indexes on expressions and multi-column indexes have been problematic almost from the start. At first, I erroneously thought that you could only have a single expression or one or multiple columns. I fixed that but not in a very satisfactory way. The issue is with the way it's all stored in pg_index, either for efficiency or historical reasons. The query pg_get_expr(indexprs,indrelid) will give you the right decoded expresssion(s), but there can be multiple of those and we have to locate them in the output of pg_get_indexdef(indexrelid) in order to split them. I doubt that we could get away with not splitting them because individual columns/expresssions can have separate attributes, e.g., collations, operator classes and we'd probably still want to parse them separately when one of those is changed. And for the presumably more common case of indexes over columns rather than expressions, splitting the properties is preferable/more understandable.

dvarrazzo commented 10 years ago

I see what you mean, but there is no way to interact with the attributes: the only things ALTER INDEX allows you are whole index operations such as renaming and changing tablespace (http://www.postgresql.org/docs/9.3/static/sql-alterindex.html). If the indexdef changes for an internal component, e.g. the collation of a field what you can do is only dropping and recreating it. So, is it really useful to split the fields (and having bugs related to parsing that complex expression)?

dvarrazzo commented 10 years ago

Please try this patch for the issue. I haven't changed the indexes split into columns, just rewrote the split helper. Couldn't run the "functional" tests for some reason I haven't checked yet but all the others pass.

jmafc commented 10 years ago

I've checked the new test and found I had to remove the extra whitespace characters you added (I'm guessing carried over from the original SQL). It works for me as follows:

    def test_bug_98(self):
        "Map a multicol index with expressions"
        dbmap = self.to_map([CREATE_TABLE_HOLIDAY_STMT, CREATE_HOLYDAY_STMT])
        idx = dbmap['schema public']['table holiday']['indexes']['unique_date']
        assert idx == {
            'keys': [
                {"(CASE WHEN recurring THEN (0)::double precision"
                 " ELSE date_part('year'::text, date) END)":
                {'type': 'expression'}},
                {"date_part('month'::text, date)": {'type': 'expression'}},
                {"date_part('day'::text, date)": {'type': 'expression'}}],
            'unique': True}

Aside: I'd put the table/index statements in the test itself. The other _STMT's in the beginning are there because they're used multiple times in the rest of the file.

jmafc commented 10 years ago

Above tests were done against 9.1 (and confirmed against 9.2). However, I see that under 9.3, we need the extra newlines and spaces that were in your original commit. I guess it's another side effect of the pretty printing code changes in 9.3 (which already screwed up results in view tests). I'm afraid we'll have to add conditional code as in test_view.py.

dvarrazzo commented 10 years ago

Maybe you can do some normalization with:

' '.join(re.sub(r"([\(\)])", r' \1 ', SQL).split())

this would collapse all the whitespaces to one space, also dealing with possible changes around the parens.

jmafc commented 10 years ago

Yes, I can do that. We already have a fix_indent function in pyrseas/testutils.py and takes care of most or all of that (although your re implementation is probably better). Do you want to submit a PR or should I?

dvarrazzo commented 10 years ago

I don't know the entire picture of the test, how they are organized, and it seems you can already run them on several server version. I'm spending my pyrseas-time on the other issues, mostly on the dependency tracking, so I think it would be better if you can fix the tests yourself.

jmafc commented 10 years ago

OK will do. (Aside: I've cloned your repo and am about to start looking at your deptrack branch)