Open amhanson9 opened 1 month ago
Columns removed for PII: Prefix, First Name, Middle Name, Last Name, Suffix, Appellation, Title, Organization Name, Address Line 1-4.
If one of these columns is not present, the script proceeds without an error.
Currently, if there is a parsing error (delimiter also used within data and cannot split into columns properly), the row is not included. And if there is an encoding error, the character is not included.
Currently, data is split by in_date into Congress Year, and if there is no year (column is blank or has text instead of an actual date), it is saved in a separate CSV as undated. Text in the date column typically means the row(s) have extra data or are missing data and therefore are not lined up with the columns correctly.
Remove columns with PII (most granular identifying detail should be the zip code) and make an additional copy of the data split into one spreadsheet per Congress Year (two years, starting on odd years) so large data sets can be opened in spreadsheet programs.
CSS Archiving Format has all metadata in one table, with 32 fields. Fields are tab delimited. See the layout.txt file for more details.