perseas / Pyrseas

Provides utilities for Postgres database schema versioning.
https://perseas.github.io/
BSD 3-Clause "New" or "Revised" License
395 stars 67 forks source link

exception thrown in dbtoyaml 0.7.1 while dumping database #105

Closed squareproton closed 10 years ago

squareproton commented 10 years ago

dbtoyaml 0.7.1 throws the following whilst dumping a postgres 9.2 database.

Traceback (most recent call last):
  File "/usr/local/bin/dbtoyaml", line 9, in <module>
    load_entry_point('Pyrseas==0.7.1', 'console_scripts', 'dbtoyaml')()
  File "/usr/local/lib/python2.7/dist-packages/pyrseas/dbtoyaml.py", line 52, in main
    print(yamldump(dbmap), file=output or sys.stdout)
  File "/usr/local/lib/python2.7/dist-packages/pyrseas/yamlutil.py", line 30, in yamldump
    return dump(objmap, default_flow_style=False, allow_unicode=True)
  File "/usr/local/lib/python2.7/dist-packages/yaml/__init__.py", line 218, in safe_dump
    return dump_all([data], stream, Dumper=SafeDumper, **kwds)
  File "/usr/local/lib/python2.7/dist-packages/yaml/__init__.py", line 190, in dump_all
    dumper.represent(data)
  File "/usr/local/lib/python2.7/dist-packages/yaml/representer.py", line 28, in represent
    node = self.represent_data(data)
  File "/usr/local/lib/python2.7/dist-packages/yaml/representer.py", line 57, in represent_data
    node = self.yaml_representers[data_types[0]](self, data)
  File "/usr/local/lib/python2.7/dist-packages/yaml/representer.py", line 223, in represent_dict
    return self.represent_mapping(u'tag:yaml.org,2002:map', data)
  File "/usr/local/lib/python2.7/dist-packages/yaml/representer.py", line 123, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/usr/local/lib/python2.7/dist-packages/yaml/representer.py", line 57, in represent_data
    node = self.yaml_representers[data_types[0]](self, data)
  File "/usr/local/lib/python2.7/dist-packages/yaml/representer.py", line 223, in represent_dict
    return self.represent_mapping(u'tag:yaml.org,2002:map', data)
  File "/usr/local/lib/python2.7/dist-packages/yaml/representer.py", line 123, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/usr/local/lib/python2.7/dist-packages/yaml/representer.py", line 57, in represent_data
    node = self.yaml_representers[data_types[0]](self, data)
  File "/usr/local/lib/python2.7/dist-packages/yaml/representer.py", line 223, in represent_dict
    return self.represent_mapping(u'tag:yaml.org,2002:map', data)
  File "/usr/local/lib/python2.7/dist-packages/yaml/representer.py", line 123, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/usr/local/lib/python2.7/dist-packages/yaml/representer.py", line 57, in represent_data
    node = self.yaml_representers[data_types[0]](self, data)
  File "/usr/local/lib/python2.7/dist-packages/yaml/representer.py", line 215, in represent_list
    return self.represent_sequence(u'tag:yaml.org,2002:seq', data)
  File "/usr/local/lib/python2.7/dist-packages/yaml/representer.py", line 101, in represent_sequence
    node_item = self.represent_data(item)
  File "/usr/local/lib/python2.7/dist-packages/yaml/representer.py", line 67, in represent_data
    node = self.yaml_representers[None](self, data)
  File "/usr/local/lib/python2.7/dist-packages/yaml/representer.py", line 247, in represent_undefined
    raise RepresenterError("cannot represent an object: %s" % data)
yaml.representer.RepresenterError: cannot represent an object: <pyrseas.dbobject.function.Function object at 0x1c0ed10>

My attempts to binary search the problem and locate the offending object at a sql level have failed due to too many interdependencies between db objects. I'm no pythonista (my flailings around the pyrseas thus far have not been a success). If you could give me a pointer as to where I can can add a few choice print statements so I can nail the offending object to the wall I'd be delighted to put together a minimal test case for you.

jmafc commented 10 years ago

This appears to be a Unicode problem, so I would suggest that --if possible-- you run dbtoyaml using Python 3, 3.3 or 3.4 would be ideal but if not 3.2. But you may first want to first look at the pg_dump -s output, for any non-ASCII text. Probably the easiest way to do that is use file on the output file and see if it reports UTF-8 Unicode text. If it does, then of course you'll have to track down the non-ASCII characters.

Since the problem appears to be with a function object, you could put print statements in pyrseas/dbobject/function.py. The basic dumping process starts in ProcDict._from_catalog (line 262) where we fetch function metadata using the query just above that method. The second stage is in Function.to_map (line 47) where we convert the internal objects into YAML-compatible (also JSON-compatible, if it helps) structures. So you could, for example, add a print(dct) just before the return of to_map (line 82). You can also add import pprint and pprint.pprint(dct) to get pretty-printed output.

If the print/pprint's still don't help you can add from pyrseas.yamlutil mport yamldump and use print(yamldump(dct)) (see the fifth line in the traceback above). That ought to produce the same error, only sooner.

squareproton commented 10 years ago

Thanks for the quick reply.

So the pg_dump -s of the database is valid utf-8 (confirmed by file and iconv). Adding a print statement where you suggested (at the return point of to_map()) gave

{'language': u'plpgsql', 'strict': True, 'source': u'\nBEGIN\n    RETURN $2;\nEND;', 'returns': u'anyelement', 'owner': u'redacted', 'volatility': 'immutable'}

as the last line. Looking to the pg_dump the only function with that body is

CREATE FUNCTION last_agg(anyelement, anyelement) RETURNS anyelement
    LANGUAGE plpgsql IMMUTABLE STRICT
    AS $_$
BEGIN
    RETURN $2;
END; $_$;

If your first thought ran to utf8 issues I know postgres can be fussy about which template is used with a CREATE DATABASE statement. There's a difference between template0 and template1 which I've never figured out. It's possible I've restored this database using the wrong template or which has thrown it for a loop.

It getting pretty / very late here in the UK so I'm going to call it a night but will have another look at this tomorrow and if you still need come up with that test case.

jmafc commented 10 years ago

The reason I was wondering if there UTF-8 characters in the pg_dump output was that in the past Unicode has given us trouble, particularly with Python 2.x. In general, I don't expect a database created in an English-speaking environment to generate a pg_dump -s output that was UTF-8 encoded because it would mean that some objects are defined using letters not in the English language, e.g., accented characters (in fact, one of the previous problems with Unicode were reported by someone in Spain who had a table name with such an accent). The output of most of the databases I work with are reported as ASCII text (unless I also dump the data and some tables have text columns with foreign language text). So it may help if you located the UTF-8 characters in that pg_dump -s output.

I presume there were other lines from to_map so I would try to look for (non-ASCII) UTF-8 characters in that output.

squareproton commented 10 years ago

There was one UTF-8 codepoint "£" other than that the file was valid ASCII. Removing it didn't make a difference.

Attempts to find the offending object have been pretty frustrating and fruitless. Every time I removed what I believed was the problem sql object from the dump another one just popped up.

Upgrading pyrseas to current master branch (commit ff8b8ce6a264179defe687a7421a722f418371da) and running with python 3.4 as per your suggestion have fixed the error and dbtoyaml and family are now working.

I'd love nothing more than to pin this one down but unfortunately given just how fiendish this db dump is, how overwhelming work is at the moment and the fact that the fix is 'upgrade' I recommend you close this issue. I'm sorry I'm unable to share the dump which causes this.

jmafc commented 10 years ago

I would recommend that you run from the head of the r0.7 branch instead of master. r0.7 is currently what I'd call the stable branch. In general bug fixes are done there first, while the master has changes for what will become 0.8.

Sorry we couldn't get to a cleaner resolution than 'upgrade' and thanks for your efforts.