metadatacenter-attic / phs-gdc

PHS-GDC Prototype
1 stars 0 forks source link

add location type census tract #2

Closed graybeal closed 3 years ago

graybeal commented 3 years ago

Add census tract to the supported location types.

(AACES: M Bondy, E Peters, A Lawson)

jvendetti commented 3 years ago

I noticed that several states have census tract FIPS codes with leading zeros, e.g.: 01001020100. I verified that downloaded data from the Data Commons Wizard correctly preserves the leading zeros in the FIPS codes.

To follow is an example of a downloaded CSV file with census tracts that have FIPS codes with leading zeros:

censusTract,Median_Income_Person,Median_Income_Person_Date,Median_Income_Person_Provenance
"02013000100",29177,"2019","census.gov"
"02016000100",33125,"2019","census.gov"
"02016000200",35461,"2019","census.gov"
"02020000101",52957,"2019","census.gov"
...

During testing, I noticed that simply opening these CSV files in spreadsheet software like Numbers and/or Excel results in the leading zeros getting stripped from the FIPS identifiers. For example, opening the above referenced CSV file in Numbers looks as follows:

Screen Shot 2021-07-02 at 9 36 02 AM

Notice the leading zeros missing from the census tracts FIPS codes. I researched this topic and learned that this is standard behavior for spreadsheet software. If you want to view the correct census tract FIPS codes in spreadsheet software, you need to import, rather than open the CSV files. During the import process, you need to specify that the censusTract column should be treated as text. The following steps can be used in Excel to perform such an import:

  1. Open Excel and create a Blank Workbook.
  2. Select File -> Import -> CSV file, and click the Import button.
  3. Select the desired CSV file, and click the Get Data button.
  4. In Step 1 of the Text Import Wizard, select the Delimited radio button and click Next.
  5. In Step 2, select Comma from the list of Delimiters and click Next.
  6. In Step 3, select the censusTract column in the "Preview of selected data" text area, and select Text as the column data format.
  7. Click the Finish button and in the resulting Import Data dialog, click OK to paste the data into your existing sheet.

The resulting import looks as follows in an Excel sheet:

Screen Shot 2021-07-02 at 9 47 43 AM

graybeal commented 3 years ago

The only other option would be to mechanically add a single quote (') to the front of those codes, which a spreadsheet usually treats as 'treat the following text as a literal'. Might fail in R the same way Marcos says ="02013000100" does.

jvendetti commented 3 years ago

Added in this commit: 63fc30899a3c73e65b10c72cb4109ce8e9f23558. Deployed to production on July 6th.