Closed aakarner closed 4 years ago
Hey @aakarner -- awesome use case, totally heard. Help me understand the skim file you're using. What's the geographic resolution -- is it block, bg, tract?
The skim table contains the origin tract geoid, destination tract geoid, and travel time for some set of population-weighted census tract centroids.
Here's an example. Skim data available here: https://www.dropbox.com/s/un32m83r4yzjl2e/SampleAutoSkims.csv?dl=0.
library(dplyr)
library(lehdr)
# Read skim data
auto_skims <- read.csv("SampleAutoSkims_ArcOnline.csv")
auto_skims$DestinationName <-
as.character(auto_skims$DestinationName)
auto_skims$OriginName <-
as.character(auto_skims$OriginName)
# Read LODES data
# Without this ungroup(), I get a warning
ga_jobs <- ungroup(
grab_lodes(state = "ga", year = 2014, lodes_type = "wac", agg_geo = "tract",
job_type = "JT00", segment = "S000"))
# Combine jobs and skim data by the destination location
acc_data <- inner_join(
# Select only required variables from skims and jobs
select(auto_skims, OriginName, DestinationName, Total_Time),
select(ga_jobs, w_tract_id, C000),
by = c("DestinationName" = "w_tract_id")) %>%
# Add in a Hansen-style gravity decay factor
mutate(decay = exp(Total_Time * -0.1))
# Calculate cumulative opportunities accessibility (45-min threshold)
acc_cumul <- acc_data %>%
filter(Total_Time <= 45) %>%
group_by(OriginName) %>%
summarize(acc45 = sum(C000))
# Calculate gravity accessibility
acc_grav <- acc_data %>%
group_by(OriginName) %>%
summarize(accgrav = sum(C000 * decay))
Yeah the primary use case for mass downloads of the LODES data for multiple states for multiple years. We'll take a look at it see if there's a less hamfisted solution here so you're not getting this understandably unexpected behavior.
Sorry this took so long, but I recently updated lehdr and the resulting tibbles should be ungrouped. Please test and let me know if you've run into the same issue.
Looks good now, thanks!
Thanks for putting this package together - it's a lot more elegant than constantly pulling down csvs from the census bureau's site.
I'm wondering why the tibbles returned by grab_lodes() have state and year grouping variables set by default. I see that this kind of makes sense if you're pulling data for multiple states/years, but even then it seems like you'd want to give the user flexibility to define their own groups.
I just needed to calculate some quick accessibility measures, for example, so I inner_join()ed a skim file to the 2014 GA wac file. I join on the destination from the skim and then want to group by origin (to get, e.g. total jobs accessible within 45 mins). If I run the necessary dplyr steps without first ungroup()ing, I get a warning that the grouping variables are being added back in. My final tibble then has several extra columns that simply repeat the year and the state.
Maybe provide a parameter in the function call to disable grouping in the output? Or disable it by default and allow the user to specify grouping variables in the output?