Closed derekeder closed 11 years ago
Great stuff. What's the best way of storing table metadata and related domain knowledge? (Here's how the county assessor works, what we know about how this data is produced.) Github wiki? Github site?
I think for now it makes sense to just add it to this spreadsheet and fiddle around with it in there. I've added a 'notes' column.
Just a nerdy historical note: the thing that Juan is describing used to be called a 'Data Dictionary' and was in the late 1970s considered just as important as the actual database. At the time it not only included a narrative description of the data's meaning but was also the only place to find out how many bytes wide your records were, or whether those bytes were to be interpreted as integers or strings.
Strangely (or not so strangely?), the term started to go out of style as relational databases took over, possibly because people confused the RDBMS metadata information (i.e. today's information_schema) with human-readable metadata:
I suspect in many organizations (even very large ones) this kind of table metadata / domain knowledge is best stored in Excel or Word. However, in the medium-sized organizations I have been exposed to, this domain knowledge was stored in the heads of 2 or 3 people who had to be interrogated at length (most of whom couldn't remember what the lesser-used tables contained or how they were generated).
That said.. there is a facility within postgres to insert textual metadata (probably not entire paragraphs, but maybe some quick one-sentence summaries) describing tables and columns. I would totally recommend doing this in the import script, at least at the table level.
COMMENT ON TABLE mytable IS 'my comment';
mydatabase=> \d+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------------------+----------+----------+------------+---------------
public | mytable | table | mcc | 24 kB | my comment
[...]
You can also do
COMMENT ON COLUMN mytable.mycolumn is 'my column comment';
And you'll see those when you \d+
a particular table.
If you need to for some reason, you can access the table comments manually like so:
mydatabase=> select pg_catalog.obj_description(c.oid) from pg_class c where relname='mytable';
obj_description
-----------------
my comment
(1 row)
I forget how to get the column comments out explicitly, but it's something similar.
Hey, I just added a bunch of code to more cleanly create the main database on any PostgreSQL 9.0+ system with PostGIS 2.0.x+ support. Basically, now you do:
setup_edifice.py --init
Which will do everything that you need the postgres superuser to do (may require password typing if you have one set), and:
setup_edifice.py --create
Which will actually create the edifice database structure. Then,
setup_edifice.py --data
Will download, unzip, and import the data. Still working on making that part do the right things, obviously.
Any constructive comments on the changes (and/or error messages) would be appreciated!
initial cataloging complete
List of datasets in the readme is a good start, but we need to be a little more explicit. Keeping track of datasets and metadata in a Google doc: https://docs.google.com/spreadsheet/ccc?key=0AtbqcVh3dkAqdGdlcWd5MzRYcGJkS1RoQTM3Qzd4dUE