Closed squareproton closed 10 years ago
This appears to be a Unicode problem, so I would suggest that --if possible-- you run dbtoyaml using Python 3, 3.3 or 3.4 would be ideal but if not 3.2. But you may first want to first look at the pg_dump -s
output, for any non-ASCII text. Probably the easiest way to do that is use file
on the output file and see if it reports UTF-8 Unicode text
. If it does, then of course you'll have to track down the non-ASCII characters.
Since the problem appears to be with a function object, you could put print
statements in pyrseas/dbobject/function.py
. The basic dumping process starts in ProcDict._from_catalog
(line 262) where we fetch function metadata using the query just above that method. The second stage is in Function.to_map
(line 47) where we convert the internal objects into YAML-compatible (also JSON-compatible, if it helps) structures. So you could, for example, add a print(dct)
just before the return of to_map (line 82). You can also add import pprint
and pprint.pprint(dct)
to get pretty-printed output.
If the print/pprint's still don't help you can add from pyrseas.yamlutil mport yamldump
and use print(yamldump(dct))
(see the fifth line in the traceback above). That ought to produce the same error, only sooner.
Thanks for the quick reply.
So the pg_dump -s
of the database is valid utf-8 (confirmed by file and iconv). Adding a print statement where you suggested (at the return point of to_map()) gave
{'language': u'plpgsql', 'strict': True, 'source': u'\nBEGIN\n RETURN $2;\nEND;', 'returns': u'anyelement', 'owner': u'redacted', 'volatility': 'immutable'}
as the last line. Looking to the pg_dump the only function with that body is
CREATE FUNCTION last_agg(anyelement, anyelement) RETURNS anyelement
LANGUAGE plpgsql IMMUTABLE STRICT
AS $_$
BEGIN
RETURN $2;
END; $_$;
If your first thought ran to utf8 issues I know postgres can be fussy about which template is used with a CREATE DATABASE
statement. There's a difference between template0 and template1 which I've never figured out. It's possible I've restored this database using the wrong template or which has thrown it for a loop.
It getting pretty / very late here in the UK so I'm going to call it a night but will have another look at this tomorrow and if you still need come up with that test case.
The reason I was wondering if there UTF-8 characters in the pg_dump output was that in the past Unicode has given us trouble, particularly with Python 2.x. In general, I don't expect a database created in an English-speaking environment to generate a pg_dump -s
output that was UTF-8 encoded because it would mean that some objects are defined using letters not in the English language, e.g., accented characters (in fact, one of the previous problems with Unicode were reported by someone in Spain who had a table name with such an accent). The output of most of the databases I work with are reported as ASCII text
(unless I also dump the data and some tables have text
columns with foreign language text). So it may help if you located the UTF-8 characters in that pg_dump -s
output.
I presume there were other lines from to_map
so I would try to look for (non-ASCII) UTF-8 characters in that output.
There was one UTF-8 codepoint "£" other than that the file was valid ASCII. Removing it didn't make a difference.
Attempts to find the offending object have been pretty frustrating and fruitless. Every time I removed what I believed was the problem sql object from the dump another one just popped up.
Upgrading pyrseas to current master branch (commit ff8b8ce6a264179defe687a7421a722f418371da) and running with python 3.4 as per your suggestion have fixed the error and dbtoyaml and family are now working.
I'd love nothing more than to pin this one down but unfortunately given just how fiendish this db dump is, how overwhelming work is at the moment and the fact that the fix is 'upgrade' I recommend you close this issue. I'm sorry I'm unable to share the dump which causes this.
I would recommend that you run from the head of the r0.7 branch instead of master. r0.7 is currently what I'd call the stable branch. In general bug fixes are done there first, while the master has changes for what will become 0.8.
Sorry we couldn't get to a cleaner resolution than 'upgrade' and thanks for your efforts.
dbtoyaml 0.7.1 throws the following whilst dumping a postgres 9.2 database.
My attempts to binary search the problem and locate the offending object at a sql level have failed due to too many interdependencies between db objects. I'm no pythonista (my flailings around the pyrseas thus far have not been a success). If you could give me a pointer as to where I can can add a few choice print statements so I can nail the offending object to the wall I'd be delighted to put together a minimal test case for you.