mountainMath / cancensus

R wrapper for calling CensusMapper APIs
https://mountainmath.github.io/cancensus/index.html
Other
82 stars 15 forks source link

Default data when no spatial format requested #120

Closed dshkol closed 5 years ago

dshkol commented 6 years ago

I noticed that when data is pulled without geo, we lose a couple of useful columns.

 cma.ct <- get_census("CA16", regions=list(CMA=cma), 
                        vectors = vectors, level = "CT",
                        labels = "short", geo_format = NA)

produces

$ GeoUID         <chr> "8250001.01", "8250001.02", "8250001.03", "8250001.04", "8250001.05", "8250...
$ Type           <fct> CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT,...
$ `Region Name`  <fct> Calgary, Calgary, Calgary, Calgary, Calgary, Calgary, Calgary, Calgary, Cal...
$ `Area (sq km)` <dbl> 1.72193, 3.94892, 1.04878, 2.57535, 1.15113, 3.34596, 2.97159, 3.66137, 3.5...
$ Population     <dbl> 5232, 6517, 2205, 5942, 2905, 3793, 6123, 5132, 6218, 2837, 5192, 4671, 261...
$ Dwellings      <dbl> 2156, 2619, 823, 2325, 1045, 1448, 2410, 2082, 2407, 1073, 1876, 1746, 1056...
$ Households     <dbl> 2104, 2571, 820, 2293, 1042, 1445, 2315, 2011, 2382, 1064, 1846, 1737, 1040...
...

Whereas

cma.ct <- get_census("CA16", regions=list(CMA=cma), 
                        vectors = vectors, level = "CT",
                        labels = "short", geo_format = "sf")

produces

$ `Shape Area`                            <dbl> 1.88067, 0.58484, 1.41712, 400.47943, 261.22246, 8...
$ Type                                    <fct> CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT...
$ Dwellings                               <int> 1220, 850, 1943, 1271, 2339, 1666, 1021, 2264, 227...
$ Households                              <int> 1114, 835, 1928, 1192, 2271, 1508, 923, 2227, 2213...
$ GeoUID                                  <chr> "8250055.00", "8250076.15", "8250052.09", "8250204...
$ Population                              <int> 3141, 2214, 5733, 3931, 6852, 4116, 2550, 6903, 65...
$ `Adjusted Population (previous Census)` <int> 2906, 2239, 5437, 3448, 6002, 3593, 2448, 5871, 64...
$ PR_UID                                  <chr> "48", "48", "48", "48", "48", "48", "48", "48", "4...
$ CMA_UID                                 <chr> "48825", "48825", "48825", "48825", "48825", "4882...
$ CSD_UID                                 <chr> "4806016", "4806016", "4806016", "4806014", "48060...
$ CD_UID                                  <chr> "4806", "4806", "4806", "4806", "4806", "4806", "4...
$ `Region Name`                           <fct> Calgary, Calgary, Calgary, Rocky View County, Rock...
$ `Area (sq km)`                          <dbl> 1.88067, 0.58484, 1.41712, 400.47943, 261.22246, 8...
...
$ geometry                                <MULTIPOLYGON [°]> MULTIPOLYGON (((-114.1179 5..., MULTI...

I think that the additional columns for PR_UID, CMA_UID, CSD_UID, CD_UID etc. should be retained even when no geo format is specified as its a common requirement to merge and aggregate at different levels of census geography, even when not explicitly working with spatial data. This would reduce load on the server by reducing the number of unnecessary calls for spatial data.

dshkol commented 5 years ago

@mountainMath what do you think about this?

mountainMath commented 5 years ago

Already fixed I think?

> cma.ct <- get_census("CA16", regions=list(CMA="59933"), 
+                         vectors = "v_CA16_1", level = "CT",
+                         labels = "short", geo_format = NA)
Querying CensusMapper API...
Downloading: 10 kB     
> glimpse(cma.ct)
Observations: 478
Variables: 12
$ GeoUID         <chr> "9330001.01", "9330001.02", "9330002.01", "9330002.03", "9330002.04", "9330003.01", "9330003.02", "9330004.01", "9330004.02", "933…
$ Type           <fct> CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT, CT…
$ `Region Name`  <fct> Vancouver, Vancouver, Vancouver, Vancouver, Vancouver, Vancouver, Vancouver, Vancouver, Vancouver, Vancouver, Vancouver, Vancouver…
$ `Area (sq km)` <dbl> 1.01953, 2.02898, 2.02909, 0.78516, 0.84513, 0.54316, 1.19645, 1.63291, 1.24706, 1.59397, 0.91749, 1.09361, 2.84304, 1.41849, 3.19…
$ Population     <dbl> 6293, 5456, 6150, 4077, 4509, 3714, 4613, 4925, 5652, 7302, 4699, 5233, 5543, 3452, 1652, 4200, 7103, 5352, 4741, 6615, 8102, 6485…
$ Dwellings      <dbl> 2334, 2132, 2631, 1497, 1652, 1210, 1429, 2537, 1847, 4444, 1784, 1925, 2373, 1256, 615, 1619, 3440, 2389, 1760, 2642, 3337, 2398,…
$ Households     <dbl> 2314, 2089, 2507, 1292, 1560, 1080, 1316, 1928, 1689, 4326, 1687, 1827, 2157, 1153, 578, 1495, 3199, 2212, 1647, 2310, 2904, 2155,…
$ CMA_UID        <chr> "59933", "59933", "59933", "59933", "59933", "59933", "59933", "59933", "59933", "59933", "59933", "59933", "59933", "59933", "599…
$ PR_UID         <chr> "59", "59", "59", "59", "59", "59", "59", "59", "59", "59", "59", "59", "59", "59", "59", "59", "59", "59", "59", "59", "59", "59"…
$ CSD_UID        <chr> "5915022", "5915022", "5915022", "5915022", "5915022", "5915022", "5915022", "5915022", "5915022", "5915022", "5915022", "5915022"…
$ CD_UID         <chr> "5915", "5915", "5915", "5915", "5915", "5915", "5915", "5915", "5915", "5915", "5915", "5915", "5915", "5915", "5915", "5915", "5…
$ v_CA16_1       <dbl> 6295, 5455, 6150, 4080, 4510, 3715, 4615, 4925, 5655, 7305, 4700, 5235, 5545, 3455, 1655, 4200, 7105, 5355, 4740, 6615, 8100, 6485…
dshkol commented 5 years ago

Must have missed this. Closing.