Closed gordonje closed 8 years ago
Actually, looks like only three of these fields can be cast as integer.
No errors for these three:
select "FILER_ID"::int, "FILER_ID", *
from "F501_502_CD";
select "FILER_ID"::int, "FILER_ID", *
from "CVR_SO_CD";
select "FILER_ID"::int, "FILER_ID", *
from "CVR_CAMPAIGN_DISCLOSURE_CD";
Whereas there are values with non-integer chars in these three:
-- [Error Code: 0, SQL State: 22P02] ERROR: invalid input syntax for integer: "L25358"
select "FILER_ID"::int, "FILER_ID", *
from "CVR_LOBBY_DISCLOSURE_CD";
-- [Error Code: 0, SQL State: 22P02] ERROR: invalid input syntax for integer: "C01851"
select "FILER_ID"::int, "FILER_ID", *
from "EFS_FILING_LOG_CD";
-- [Error Code: 0, SQL State: 22P02] ERROR: invalid input syntax for integer: "L22607"
select "FILER_ID"::int, "FILER_ID", *
from "CVR_REGISTRATION_CD";
Actually...hold up...those three seemingly safe queries above might not actually be.
The FILER_ID
column on the CVR_SO_CD
table is safe. There are no values with non-numeric characters:
SELECT "FILER_ID"
FROM "CVR_SO_CD"
WHERE "FILER_ID" ~ E'\D';
On the F501_502_CD
table, there 9,856 distinct FILER_ID
values:
SELECT COUNT(*)
FROM (
SELECT DISTINCT "FILER_ID"
FROM "F501_502_CD"
WHERE "FILER_ID" is not NULL and "FILER_ID" <> ''
) as foo;
And five of these include non-numeric characters:
SELECT "FILER_ID"
FROM "F501_502_CD"
WHERE "FILER_ID" ~ E'\D';
These:
FILER_ID
--------
CA
R98*4826
R98*3760
R98*1282
R98*4402
There are also 459 F501_502_CD
records with negative integer FILER_ID
values. Not sure what the hell is going on with that.
In CVR_CAMPAIGN_DISCLOSURE_CD
, there are 11,438 distinct FILER_ID
values:
SELECT COUNT(*)
FROM (
SELECT DISTINCT "FILER_ID"
FROM "CVR_CAMPAIGN_DISCLOSURE_CD"
WHERE "FILER_ID" is not NULL and "FILER_ID" <> ''
) as foo;
And six that have include non-numeric characters:
SELECT "FILER_ID"
FROM "CVR_CAMPAIGN_DISCLOSURE_CD"
WHERE "FILER_ID" ~ '\D';
These:
FILER_ID
--------
C24067
C24067
C00578
C01980
C28234
C27120
Again, the only column we're safe to change is CVR_SO_CD.FILER_ID
. Probably not worth the potential hassle of broken morning updates if in the future ever they ship values in this column with non-numeric chars.
There are 22 CalAccess tables that include a
FILER_ID
column. On 15 of these models, the field is typed as integer, and on the remaining seven, it is typed as char.From the looks of it, we were adhering to the official documentation (e.g., F501_502_CD, CVR_SO_CD).
But do we have to? Writing queries for the processed-data app will be easier if we have consistent data types for columns in the
JOIN
criteria. Also, the postgres / mysql syntax would be big hassle otherwise.