rOpenGov / geofi

R package for accessing Finnish geospatial data
https://ropengov.github.io/geofi/
Other
19 stars 6 forks source link

Wrong municipality name for Pedersöre #17

Open dataninjafi opened 5 years ago

dataninjafi commented 5 years ago

In municipality_key_2019$kunta_name Pedersöre's name is "Pedersören kunta" while other municipalities lack " kunta" ending.

antagomir commented 5 years ago

I think this has been implemented in inst/extras/create_municipality_keys.R and "kunta" has been added afterwards. This might be for compatibility reasons with other data sources (@muuankarski ?).

I also think that "Pedersöre" would be more handy. Things brings in mind two topics to decide:

1) Should we stick to the original names in the default data table, and then provide a separate wrapper that can be used if one likes to further harmonize the names or convert them into different formats, depending on the compatibility needs

2) The field "kunta_name" might be better renamed as "municipality_fi" or something?

muuankarski commented 5 years ago

Most probably Pedersören kunta is correct name in Finnish. Had a quick look at few kuntadata resources (code below) and they all had Pedersören kunta in Finnish and Pedersöre in Swedish. Also wikipedia and their website (at the bottom) uses Pedersören kunta. It is odd and there must be a reason for this, but I think we better stick to Pedersören kunta in Finnish names.

As for column names, there are also name_fi and name_sv columns. name_fi equals kunta_name and therefore kunta_name could be completely removed


# Lets query some kuntadata to see how pedersöre is written
library(dplyr)
library(rvest)

# 1. Tilastokeskuksen kuntaluokitus 
## In Finnish
read_html("https://www.tilastokeskus.fi/meta/luokitukset/kunta/001-2019/index.html") %>% 
  html_table(fill = TRUE) %>% 
  .[2] %>%
  .[[1]] %>% 
  as_tibble(.name_repair = "universal") %>% 
  filter(grepl("Pedersö", X2))
# X1 X2              
# <int> <chr>           
# 599 Pedersören kunta

## In Swedish
read_html("https://www.tilastokeskus.fi/meta/luokitukset/kunta/001-2019/index_sv.html") %>% 
  html_table(fill = TRUE) %>% 
  .[2] %>%
  .[[1]] %>% 
  as_tibble(.name_repair = "universal") %>% 
  filter(grepl("Pedersö", X2))
# X1 X2       
# <int> <chr>    
# 599 Pedersöre

## In English
read_html("https://www.tilastokeskus.fi/meta/luokitukset/kunta/001-2019/index_en.html") %>% 
  html_table(fill = TRUE) %>% 
  .[2] %>%
  .[[1]] %>% 
  as_tibble(.name_repair = "universal") %>% 
  filter(grepl("Pedersö", X2))
# X1 X2       
# <int> <chr>    
# 599 Pedersöre

# 2. Kuntaliitto: Alueluokat ja kuntanumerot 2019
fly <- tempfile()
download.file("https://www.kuntaliitto.fi/sites/default/files/media/file/Alueluokat%20ja%20kuntanumerot%202019.xlsx",
              fly)
readxl::read_excel(fly, skip = 12) %>% 
  filter(grepl("Pedersö", `Kunnan nimi`)) %>% 
  select(1:3)
# Kuntanumero `Kunnan nimi`    `Ruotsinkielilinen nimi`
# <chr>       <chr>            <chr>                   
# 599         Pedersören kunta Pedersöre   

# 3. MML:n kuntarajat Paituli paikkatietopalvelusta
library(ows4R)
wfs <- WFSClient$new("http://avaa.tdata.fi/geoserver/paituli/wfs",
                     serviceVersion = "2.0.0",
                     logger = "INFO")

caps <- wfs$getCapabilities()
ft <- caps$findFeatureTypeByName("paituli:mml_hallinto_2018_10k", exact = TRUE)
shape <- ft$getFeatures()
shape %>% 
  filter(grepl("Pedersö", NAMEFIN)) %>% 
  select(NATCODE,NAMEFIN,NAMESWE)
# NATCODE          NAMEFIN   NAMESWE                       the_geom
#      599 Pedersören kunta Pedersöre MULTISURFACE (POLYGON ((287...
antagomir commented 5 years ago

I think it is good to use the official names by default (and yes let's remove "kunta_name" field).

The data generation script in inst/extras/create_municipality_keys.R seems to make some modifications so let us make sure that the names are kept in their official formats.

If there is a need we can add wrappers that can convert the official names to shorter or other alternative forms for the names.