Open jordanijames opened 1 month ago
Good work!
It looks like we are still getting one summary line at the bottom of the file:
tail(private_schools)
# A tibble: 6 × 19
`Private School Name` State Name [Private …¹ American Indian/Alas…² Asian or Asian/Pacif…³
<chr> <chr> <dbl> <dbl>
1 "ZION TEMPLE CHRISTIAN ACADEMY" "OHIO" 0 0
2 "ZION'S HILL BAPTIST SCHOOL" "INDIANA" 0 0
3 "ZION-ST JOHN LUTHERAN SCHOOL" "IOWA" 0 3
4 "ZUNI CHRISTIAN MISSION SCHOOL" "NEW MEXICO" 93 0
5 "ZVI DOV ROTH ACADEMY OF YESHIVA … "NEW YORK" 0 0
6 "Data Source: U.S. Department of … "" NA NA
Reducing n_max
by 1 should fix that problem.
I would not group_by
county name because county names are not unique across states. Thats what the fips number is for. So, use county_code
in the group_by command.
Also, the county_code variable has leading zeroes which is causing it to be treated as a character variable. You can fix that easily by doing:
public_2019 <- public_2019 |>
mutate(county_code = as.numeric(county_code))
private_2019 <- private_2019 |>
mutate(county_code = as.numeric(county_code))
You could also fix this when you read it in by specifying col_types
but given the ugly variable names it would be a pain.
Lastly, the number_of_private_schools
is a bit verbose. We want to strike a balance between variable names that have meaning and ones that are so long that they make our code look terrible. I think something like n_private
whould be sufficient.
I made all the changes! I pushed what I have so far. Now I think I need to do the dissimilarity index part, but I'm kind of confused about how and when I should merge the public_2019 and private_county subsets. Also, how do I combine all the non-white race variables? Would I just make a new variable to add to the table? Do I make a public_county subset and group that a certain way? I know how to calculate the dissimilarity index I'm just not sure how to get there.
You can create a new nonwhite variable by just adding up the other ones. This code will trim down the dataset and help you see how this will all work (replace temp with something better):
temp <- public_2019 |>
mutate(NonWhite = AIAN+Asian+Hispanic+Black+Hawaiian+Multiracial) |>
select(county_code, White, NonWhite) |>
drop_na() |>
arrange(county_code)
That is all you really need to calculate your dissimilarity measures. Its very similar to what we did last term for tracts, but now instead of tracts you have schools.
Thank you so much! Another question! when I arrange(county_code) in the table I get the same county code over and over in the columns (county_code 1001, 1001, 1001, 1001) is that supposed to happen? Because in the private_county one when I group by county_code it doesn't do that.
You haven't grouped yet. Each observation is a school and there are many schools per county so you see it many times. The same was true of the private school data before you grouped it. When you calculate the segregation index you will group the public school data as well.
Good morning Aaron, I'm sorry but I don't know what I'm doing wrong or missing. I'm using the group_by function and it still doesn't group the county_code variable. I tried doing what I did for private_county, but it's not working. I don't think I need the summarize command. I did change them to numeric values instead of characters, so I don't know what I'm doing wrong.
I am not seeing what code you are referring to. I see the creation of the private_county
and public_county
objects. That code works. The public_county
object is not a county-level dataset though, its a school-level dataset.
Regarding this code:
calc_dissimilarity <- function(public_county) {
a <- public_county$White/sum(public_county$White)
b <- public_county$NonWhite/sum(public_county$NonWhite)
return(50 * sum(abs(a-b)))
}
it will work, but I think calling the argument public_county
is confusing as you called your object that name and its not the same thing. You want to just put in all of the schools for a given county, so you might want to call this county
or something.
Your commented code below will work if you change tracts
to public_county
:
public_county |>
filter(county_code == "1001") |>
calc_dissimilarity()
@AaronGullickson Hello! I have organized the data, I selected and renamed the variables for both data sets. For the new subsets I made "public_2019" and "private_2019", there is a placeholder in the table that isn't actually data, and I don't know how to get rid of it. So if you organize the subsets by county_name, the 1st column will be empty and I want to get rid of that column.