Closed dvarrazzo closed 10 years ago
Sorry, including the stacktrace could be useful too:
dbtoyaml pyrseas
Traceback (most recent call last):
File "/usr/local/bin/dbtoyaml", line 9, in <module>
load_entry_point('Pyrseas==0.7.1', 'console_scripts', 'dbtoyaml')()
File "/usr/local/lib/python2.7/dist-packages/pyrseas/dbtoyaml.py", line 49, in main
dbmap = db.to_map()
File "/usr/local/lib/python2.7/dist-packages/pyrseas/database.py", line 263, in to_map
self.from_catalog()
File "/usr/local/lib/python2.7/dist-packages/pyrseas/database.py", line 170, in from_catalog
self.db = self.Dicts(self.dbconn)
File "/usr/local/lib/python2.7/dist-packages/pyrseas/database.py", line 95, in __init__
self.indexes = IndexDict(dbconn)
File "/usr/local/lib/python2.7/dist-packages/pyrseas/dbobject/__init__.py", line 471, in __init__
self._from_catalog()
File "/usr/local/lib/python2.7/dist-packages/pyrseas/dbobject/index.py", line 190, in _from_catalog
assert(rest.startswith(expr))
AssertionError
If you comment out the assert
on which it fails (pyrseas/dbobject/index.py
line 190), it doesn't give an error, but I'm afraid the YAML output isn't right. I thought I had seen complicated index expressions, but this beats them by two orders of magnitude.
I can't even vouch it's a reasonable index :) I've just found it in our live database, which definitely contains questionable constructs.
Well, Postgres accepted it so that's all that matters from the perspective of dbtoyaml/pg_dump. However, I do note that since it's supposed to be a unique index and recurring
is boolean, it appears there can be only one recurring
true value in the table, which seems odd.
I don't want to defend that schema :) However what is says is that there can be all the different non-recurring holidays, but there can be only a recurring holiday per (month, day).
-- you can't have two recurring holidays on the same date, different years
pyrseas=# insert into timesheet_holiday (name, date, recurring) values ('xmas', '2013-12-25', true);
INSERT 0 1
pyrseas=# insert into timesheet_holiday (name, date, recurring) values ('nyeve', '2013-12-31', true);
INSERT 0 1
pyrseas=# insert into timesheet_holiday (name, date, recurring) values ('nyeve2', '2014-12-31', true);
ERROR: duplicate key value violates unique constraint "unique_date"
-- but you can have non-recurring ones
pyrseas=# insert into timesheet_holiday (name, date, recurring) values ('bday1', '2013-7-7', false);
INSERT 0 1
pyrseas=# insert into timesheet_holiday (name, date, recurring) values ('bday2', '2014-7-7', false);
INSERT 0 1
pyrseas=# select * from timesheet_holiday;
id | name | date | recurring
----+-------+------------+-----------
1 | xmas | 2013-12-25 | t
2 | nyeve | 2013-12-31 | t
4 | bday1 | 2013-07-07 | f
5 | bday2 | 2014-07-07 | f
(4 rows)
It basically forces the uniqueness of the triple (Y, M, D) for non-recurring dates and of (0, M, D) for recurring ones.
As you say however this is not pyrseas concern :) I think it should just be more careful in parsing index expressions containing parens (which is still doable when the expression doesn't contain strings, but when it also contains a string and the string contain parens... well that's a further order of magnitude of mess!)
Or, alternatively, do you really need to split a multi-col index into its columns and store them separately in yaml?
Indexes on expressions and multi-column indexes have been problematic almost from the start. At first, I erroneously thought that you could only have a single expression or one or multiple columns. I fixed that but not in a very satisfactory way. The issue is with the way it's all stored in pg_index, either for efficiency or historical reasons. The query pg_get_expr(indexprs,indrelid)
will give you the right decoded expresssion(s), but there can be multiple of those and we have to locate them in the output of pg_get_indexdef(indexrelid)
in order to split them. I doubt that we could get away with not splitting them because individual columns/expresssions can have separate attributes, e.g., collations, operator classes and we'd probably still want to parse them separately when one of those is changed. And for the presumably more common case of indexes over columns rather than expressions, splitting the properties is preferable/more understandable.
I see what you mean, but there is no way to interact with the attributes: the only things ALTER INDEX allows you are whole index operations such as renaming and changing tablespace (http://www.postgresql.org/docs/9.3/static/sql-alterindex.html). If the indexdef changes for an internal component, e.g. the collation of a field what you can do is only dropping and recreating it. So, is it really useful to split the fields (and having bugs related to parsing that complex expression)?
Please try this patch for the issue. I haven't changed the indexes split into columns, just rewrote the split helper. Couldn't run the "functional" tests for some reason I haven't checked yet but all the others pass.
I've checked the new test and found I had to remove the extra whitespace characters you added (I'm guessing carried over from the original SQL). It works for me as follows:
def test_bug_98(self):
"Map a multicol index with expressions"
dbmap = self.to_map([CREATE_TABLE_HOLIDAY_STMT, CREATE_HOLYDAY_STMT])
idx = dbmap['schema public']['table holiday']['indexes']['unique_date']
assert idx == {
'keys': [
{"(CASE WHEN recurring THEN (0)::double precision"
" ELSE date_part('year'::text, date) END)":
{'type': 'expression'}},
{"date_part('month'::text, date)": {'type': 'expression'}},
{"date_part('day'::text, date)": {'type': 'expression'}}],
'unique': True}
Aside: I'd put the table/index statements in the test itself. The other _STMT's in the beginning are there because they're used multiple times in the rest of the file.
Above tests were done against 9.1 (and confirmed against 9.2). However, I see that under 9.3, we need the extra newlines and spaces that were in your original commit. I guess it's another side effect of the pretty printing code changes in 9.3 (which already screwed up results in view tests). I'm afraid we'll have to add conditional code as in test_view.py.
Maybe you can do some normalization with:
' '.join(re.sub(r"([\(\)])", r' \1 ', SQL).split())
this would collapse all the whitespaces to one space, also dealing with possible changes around the parens.
Yes, I can do that. We already have a fix_indent
function in pyrseas/testutils.py
and takes care of most or all of that (although your re
implementation is probably better). Do you want to submit a PR or should I?
I don't know the entire picture of the test, how they are organized, and it seems you can already run them on several server version. I'm spending my pyrseas-time on the other issues, mostly on the dependency tracking, so I think it would be better if you can fix the tests yourself.
OK will do. (Aside: I've cloned your repo and am about to start looking at your deptrack branch)
dbtoyaml 0.7.1 fails to dump this table:
CREATE TABLE timesheet_holiday ( id serial PRIMARY KEY, name text NOT NULL, date date NOT NULL, recurring boolean NOT NULL );
CREATE UNIQUE INDEX unique_date ON timesheet_holiday USING btree ((CASE WHEN recurring THEN (0)::double precision ELSE date_part('year'::text, date) END), date_part('month'::text, date), date_part('day'::text, date));