rOpenGov / geofi

R package for accessing Finnish geospatial data
https://ropengov.github.io/geofi/
Other
19 stars 6 forks source link

Consider VRK:n rakennusten osoitetiedot ja äänestysalueet -data #13

Open muuankarski opened 5 years ago

muuankarski commented 5 years ago

Väestörekisterikeskus publishes annually data containing all buildings in Finland. Data is zipped delimited file with .OPT-extension and has 3,6 million rows. It can be read and processed in R (slowly) with following code:

# 2019
library(dplyr)
library(sp)
library(sf)
tmpfile <- tempfile()
tmpdir <- tempdir()
download.file("https://www.avoindata.fi/data/dataset/cf9208dc-63a9-44a2-9312-bbd2c3952596/resource/ae13f168-e835-4412-8661-355ea6c4c468/download/suomi_osoitteet_2019-05-15.zip",
              destfile = tmpfile)
unzip(zipfile = tmpfile,
      exdir = tmpdir)

opt <- read.csv(glue::glue("{tmpdir}/Suomi_osoitteet_2019-05-15.OPT"), 
                sep = ";", 
                stringsAsFactors = FALSE, 
                header = FALSE)

names(opt) <- c("rakennustu","sijaintiku",
                "sijaintima","rakennusty",
                "CoordY","CoordX",
                "osoitenume", "katunimi_f",
                "katunimi_s", "katunumero",
                "postinumer", "vaalipiirikoodi",
                "vaalipiirinimi","tyhja",
                "idx", "date")
if (F){ # subsetting just to make conversions faster
opt_orig <- as_tibble(opt)
opt <- sample_n(opt_orig, size = 2000)
}

opt$katunimi_f <- iconv(opt$katunimi_f, from = "windows-1252", to = "UTF-8")
opt$katunimi_s <- iconv(opt$katunimi_s, from = "windows-1252", to = "UTF-8")
opt$katunumero <- iconv(opt$katunumero, from = "windows-1252", to = "UTF-8")
opt$vaalipiirinimi <- iconv(opt$vaalipiirinimi, from = "windows-1252", to = "UTF-8")

sp.data <- SpatialPointsDataFrame(opt[, c("CoordX", "CoordY")], 
                                  opt, 
                                  proj4string = CRS("+init=epsg:3067"))

# Project the spatial data to lat/lon
# sp.data <- spTransform(sp.data, CRS("+proj=longlat +datum=WGS84"))

shape <- st_as_sf(sp.data)

st_coordinates(shape)

# shape %>% select(rakennustu) %>% plot()

saveRDS(shape, file=paste0("./sf19_buildings.RDS"))

Any ideas how to incorporate this with geofi. It is useful for instance when geocoding sensitive addresses.

However, this would require a storage as the data should be preprocessed. Do you think this as a suitable data for geofi and should we create a data repo such as geofi_data?

muuankarski commented 5 years ago

Created a new branch and wrote simple function & tutorial example here: https://github.com/rOpenGov/geofi/commit/b67fcd329d56147b5f6b047506d6b949028a7330

muuankarski commented 4 years ago

stale branch feature-vrk-building removed

content below:

#' @title Get geospatial data with all buildings and electoral districts from Väestörekisterikeskus
#' @description preprocessed geospatial data sf-objects
#' @author Markus Kainu <markus.kainu@kela.fi>
#' @return sf-object
#' @export
#' @examples
#'  \dontrun{
#'  f <- get_buildings()
#'  plot(f)
#'  }
#'
#' @rdname get_buildings
#' @export

get_buildings <- function(){

  library(dplyr)
  library(sp)
  library(sf)
  tmpfile <- tempfile()
  tmpdir <- tempdir()
  download.file("https://www.avoindata.fi/data/dataset/cf9208dc-63a9-44a2-9312-bbd2c3952596/resource/ae13f168-e835-4412-8661-355ea6c4c468/download/suomi_osoitteet_2019-08-15.zip",
                destfile = tmpfile)
  unzip(zipfile = tmpfile,
        exdir = tmpdir)

  opt <- read.csv(glue::glue("{tmpdir}/suomi_osoitteet_2019-08-15.OPT"), fileEncoding = "latin1",
                  sep = ";", 
                  # nrows = 50000,
                  stringsAsFactors = FALSE, 
                  header = FALSE)

  names(opt) <- c("rakennustu","sijaintiku",
                  "sijaintima","rakennusty",
                  "CoordY","CoordX",
                  "osoitenume", "katunimi_f",
                  "katunimi_s", "katunumero",
                  "postinumer", "vaalipiirikoodi",
                  "vaalipiirinimi","tyhja",
                  "idx", "date")

  sp.data <- SpatialPointsDataFrame(opt[, c("CoordX", "CoordY")], 
                                    opt, 
                                    proj4string = CRS("+init=epsg:3067"))

  # Project the spatial data to lat/lon
  # sp.data <- spTransform(sp.data, CRS("+proj=longlat +datum=WGS84"))

  shape <- st_as_sf(sp.data)

  # st_coordinates(shape)

  # shape %>% select(rakennustu) %>% plot()

  return(shape)
}