woodRock / super-telegram

We are migrating an existing Web Map Service (WMS) to a different server. The existing services uses Apache, Postgres, Postgis (psql extention), and Mapserver. The new version of the server has an updated version of both Mapserver and Postgres installed. Between the major versions of Mapserver some of the syntax from the existing Mapfiles is now deprecated.
0 stars 0 forks source link

Verify Postgres Database #12

Closed woodRock closed 3 years ago

woodRock commented 3 years ago

Goal

Verify the Postgres database contains the same number of records as the previous.

Tasklist

Success

We know all of the tables have the same number of records as the previous Postgres database.

woodRock commented 3 years ago

I created a basic script to verify the Postgres databases. It performed a simple count(*) operation on each table for every database that exists on the old server and compared that to the new server.

woodRock commented 3 years ago

Interestingly enough I noticed there were major discrepencies. In fact, for many entries, there was nearly double the amount of records on the development server, than the previous production server. This was only the case for select tables on a few databses. I now realized that I had duplicated some of the records. I had performed the pg_dump twice.

Most tables would have a unique contraint on their primary key. This would avoid duplicate records. However, some tables may allow for duplicate entries. Due to the nature of the records they store. This would explain why only some tables appear to have twice as many records.

woodRock commented 3 years ago

I wrote a script to compare the counts for each table of every database. The spatial_ref_sys counts were off. This is likely due to the different versions of the Postgis extension. I encountered an interesting bug, my script cannot handle tables with spaces in their names. This revealed a potential flaw in the original database as well. As it is not a good convention to have table names with white-spaces inside them.

woodRock commented 3 years ago

I could fix my script by changing the IFS path variable which determines the delimiters which the for loop will take into account. However I cannot use backticks to escape the spaces within the SQL statements. As the SQL statements are run in backtick wrapped execution statements.

woodRock commented 3 years ago

We can query tables with spaces in there names using the following syntax.

select count(*) from <schema>."<table>"

To do this we require the schema for that table. Not all of the tables use the public schema. We will have to update the script to pass the schema and table name to the get_count() functions.

woodRock commented 3 years ago

Turns out the solution was actually far simpler.

We can remove trailing white spaces from an expression using sed.

sed 's/[  \t]*$//'

We can pipe this command into the get_tables() function. And surround the SQL statements with escaped quotes. This fixed our issue with poorly named tables that contain spaces.