Closed justinperline closed 10 months ago
Reserving the climate data for a later date since I'll try to source that from a better/more recent database. Decided to focus solely on the topographic classification and converting that to a place-level variable.
Difficulty would seem to be joining this county-level data to the place-level data...
This was accomplished via a spatial join, looking for counties that intersected the geometries of each place.
Using the largest = TRUE
parameter so that only the county with the most overlap is passed on in the join rather than allowing a single-to-multiple relationship. The latter presented problems since this is not a continuous measure and cases with multiple adjoining counties could have different topographical codes. Rather than attempting some mode or mean calculation, the largest parameter looks for which polygon has the largest overlap.
This was accomplished via a spatial join, looking for counties that intersected the geometries of each place.
This was more complicated than it seemed at first for two reasons. The size of the join was apparently pretty large - joining ~31k places to ~3k counties, and also the fact that some counties changed names over the years. The former just caused a delay in processing but the latter was a breaking error. After lots of tests to diagnose the problem, realized the geometries of the Connecticut counties I was using were causing invalid intersections (I was using old geometry because the CT counties have since been re-organized into "planning regions" that did not align with previous county lines).
Fixed this and resorted to parallel processing of the spatial join to vastly speed up the code.
Code here: https://github.com/justinperline/samegrassbutgreener/blob/main/code/natural_amenities.R
Went back and added the relative summer humidity value to the final dataset since humidity wasn't one of the accessible values from NOAA for climate mapping
The US Department of Agriculture has published data on every contiguous county's natural amenity value. This seems to be an equally-weighted z-score summation of several climate and topographical variables. This would be a good way to determine proximity to mountains (or to use the topographical scale itself). The climate data is a good backup dataset in case it proves difficult getting this more up-to-date and/or at the place-level. Difficulty would seem to be joining this county-level data to the place-level data...