Closed dmoul closed 1 year ago
Thank for filing this detailed, reproducible issue! I will have to do some more digging. I can reproduce your error, but what's perplexing me at the moment is that when I run the process for all of Orange County then subset down, I get the correct answer. Try running this and let me know if you see this too:
library(tidycensus)
library(tidyverse)
orange00 <- get_decennial(
geography = "tract",
variables = "P001001",
year = 2000,
state = "NC",
county = "Orange",
geometry = TRUE
)
orange10 <- get_decennial(
geography = "tract",
variables = "P001001",
year = 2010,
state = "NC",
county = "Orange",
geometry = TRUE
) %>%
select(GEOID)
orange_00_to_10 <- orange00 %>%
interpolate_pw(
to = orange10,
to_id = "GEOID",
extensive = TRUE,
weights = orange_blocks,
weight_column = "total",
crs = my_proj
)
filter(orange_00_to_10, str_sub(GEOID, 7, 9) == "107")
Simple feature collection with 5 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 595274.1 ymin: 234144.7 xmax: 603836 ymax: 244269.5
Projected CRS: NAD83(2011) / North Carolina
# A tibble: 5 × 3
GEOID geometry value
* <chr> <MULTIPOLYGON [m]> <dbl>
1 37135010703 (((601003.3 240530, 601934.4 240212.1, 602282.7 2400… 5170
2 37135010701 (((600383.9 242309.3, 600484.7 241938, 600541.5 2416… 1938
3 37135010704 (((598798.1 234538.8, 598658.6 234556.2, 598630.6 23… 4614
4 37135010705 (((603016.9 241210.8, 603442.2 240702.8, 603447.6 24… 5006.
5 37135010706 (((602113.8 243794.4, 602367.5 244179.4, 602771.1 24… 3505.
I figured it out. It is related to #476. I use the name total
local to the function and so when you try to interpolate a column named total
, it doesn't work right. Naming your column anything else will work. Here is your exact code, but with the name pop
instead of total
for total population:
> debug_interpolate_pw_2000 |>
+ st_drop_geometry() |>
+ arrange(geoid)
# A tibble: 5 × 2
geoid pop
<chr> <dbl>
1 37135010701 1938
2 37135010703 5170
3 37135010704 4614
4 37135010705 5002.
5 37135010706 3508.
As with #476, this is on me for not thinking through what names should be reserved internally. I'll push a fix and close this issue when I do, but in the meantime just don't use total
and it'll work correctly.
Fixed now - thanks again for the heads up!
Confirmed: it works. Thanks for the fast turn-around.
Hi Kyle. First, many thanks for making it so easy to work with US Census data in R!
It seems there are cases in which I am not getting correct values from
interpolate_pw()
. I have been unable to confirm whether I am making a mistake (surely the most likely case) or perhaps there is some unexpected pattern in the geometry that the algorithm is struggling with. I'm using tidycensus_1.2.3 and sf_1.0-8.Below I consider only NC Orange County tracts 107.* in 2000 and 2010. Tract 107.02 was about twice the desired tract population in the 2000 decennial census and was split into tracts 107.05 and 107.06 for 2010. The interpolated values for tracts 01, 03, and 04 are sensible, however 05 is about the size of 02 and 06 is exactly the size of 02. Note that this pathology exists in other NC counties (not included below). I did not see this problem when recreating your
interpolate_pw()
example at https://walker-data.com/census-r/spatial-analysis-with-us-census-data.html#small-area-time-series-analysisWould you take a look? Thanks in advance.
Created on 2022-11-12 with reprex v2.0.2