Open ozanj opened 6 years ago
@ozanj yup, they are zip-code data! State-level shouldn't be hard to get. And it looks like city-level should be possible as well, if 312 here is the correct geographic unit. Just hope merging in the city data won't be too big an issue, if that needs to be done by city name/state?
@cyouh95 @mpatricia01
Thanks Crystal, can you add the state-level variables. let me know when that is done and we'll start working those vars into the problem set.
after that is done, try adding city vars but if it looks like it will take a long time or if it looks like quality of merge will be bad then don't add them.
thank you!
@ozanj @mpatricia01 Here is the CSV w/ the state-level variables: 45f9ebf
state_name
state_fips_code
(2-digit code)pop_total_state
, pop_white_state
, etc. median_inc_2544_state
, median_inc_4564_state
(might not be needed in final data, but was used to calculated below)avgmedian_inc_2564_state
@ozanj @mpatricia01 Ah, just realized there is a STAT_CODE
, ZIP
, HS_STATE
, and HS_CITY
in the original data. The home state (STAT_CODE
) and HS_STATE
might be different (But I think mostly because many rows are missing the HS_STATE
/HS_CITY
fields).
Currently, state-level data is merged to STAT_CODE
. Zip-code level was to (home) ZIP
as well, but I guess city-level data would have to be the HS's city?
@cyouh95 @mpatricia01
Prior to seeing note you just sent, I just pulled changes to wwlist_merged_state.csv
then I modified create_prospect_list.R and modified wwlist_merged.Rdata and pushed those changes to github.
why don't you check out whether stuff I did still works in light of note you just sent.
@cyouh95 @mpatricia01
Separate request: can you provide Patricia and I with information about how the prospect list defines race/ethnicity [var=ethn_code] and how ACS data defines race/ethnicity?
Patricia will create lead in developing draft of problem set. and one set of questions will compare the race/ethnicity of prospects purchased [at zip-code level or state level] to overall race/ethnicity composition at the zip-code level or state-level.
a potential concern is that prospect list definition of race/ethnicity may differ from ACS definition, so would be helpful for Patricia to see the definitions so that she can make decisions about what is possible to ask students to do.
@ozanj @mpatricia01 Everything should still work fine - I think we do want the state-level census data merged to STAT_CODE
instead of HS_STATE
right? (since the latter field is missing for more obs) But don't think there's many cases where the states would differ!
But if we can get city-level data, then it'd have to be merged to HS_CITY
, whereas zip-code (and state-level) are to home location, if that's okay?
@ozanj @mpatricia01 Here are the definitions according to CollegeBoard [x][x]
There should be 2 questions in the above questionnaire (race and ethnicity), but looks like it may be combined in the wwlist's ETHN_CODE
field:
Cuban
Mexican/Mexican American
Puerto Rican
Other Spanish/Hispanic
American Indian or Alaska Native
Asian or Native Hawaiian or Other Pacific Islander
Asian or Native Hawaiian or Other Pacific IslanderH [typo of above - and this might be using the pre-2016 version? but RECEIVE_DATE is in 2016]
Black or African American
White
Other-2 or more
Not reported
And here are the ACS variables/definitions [x][x]
pop_total
[B03002_001E]: Totalpop_white
[B03002_003E]: Not Hispanic or Latino > White Alonepop_black
[B03002_004E]: Not Hispanic or Latino > Black or African American Alonepop_amerindian
[B03002_005E]: Not Hispanic or Latino > American Indian and Alaska Native Alonepop_asian
[B03002_006E]: Not Hispanic or Latino > Asian Alonepop_nativehawaii
[B03002_007E]: Not Hispanic or Latino > Native Hawaiian and Other Pacific Islander Alonepop_otherrace
[B03002_008E]: Not Hispanic or Latino > Some Other Race Alonepop_tworaces
[B03002_009E]: Not Hispanic or Latino > Two or More Racespop_hispanic
[B03002_012E]: Hispanic or Latinosounds good.
yes, for merging state-level data to prospect-level data, it seems conceptually best to merge to state_code rather than hs_state because conceptually I think we may be more focused on where student lives rather than state of HS attended. this is holding aside the data completeness issue.
In examples in class, I've generally been using state_code rather than HS state.
let's hold off on adding any city_level variables. I think zip_code level analyses and state_level analyses are sufficient. adding more would make problem set too long.
thansk for all this Crystal!
@cyouh95
for the prospect list from western washington university, the measures of median income and race are at the zip-code level, correct?
how hard would it be to add city-level and state-level measures of?:
this would be used for a problem set I have to create by Friday