rOpenGov / geofi

R package for accessing Finnish geospatial data
https://ropengov.github.io/geofi/
Other
19 stars 6 forks source link

`get_municipalities` argument `codes_as_character` seems to not work #46

Open sampoves opened 1 year ago

sampoves commented 1 year ago

Hello,

It would seem to me that the argument codes_as_character for function get_municipalities does not work in geofi_1.0.9:

codes_as_character is FALSE

> muns1 <- geofi::get_municipalities(codes_as_character = FALSE) %>% 
+   dplyr::select(kunta)
> 
Requesting response from: http://geo.stat.fi/geoserver/wfs?service=WFS&version=1.0.0&request=getFeature&typename=tilastointialueet%3Akunta4500k_2023
Data is licensed under: Attribution 4.0 International (CC BY 4.0)
Warning message:
Coercing CRS to epsg:3067 (ETRS89 / TM35FIN) 
> 
> muns1
Simple feature collection with 309 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 83747.59 ymin: 6637032 xmax: 732907.7 ymax: 7776431
Projected CRS: ETRS89 / TM35FIN(E,N)
First 10 features:
   kunta                           geom
1      5 MULTIPOLYGON (((366787.9 70...
2      9 MULTIPOLYGON (((382543.4 71...
3     10 MULTIPOLYGON (((343298.2 69...
4     16 MULTIPOLYGON (((436139.7 67...
5     18 MULTIPOLYGON (((426631 6720...
6     19 MULTIPOLYGON (((263938.3 67...
7     20 MULTIPOLYGON (((328844.1 67...
8     35 MULTIPOLYGON (((176190.4 67...
9     43 MULTIPOLYGON (((92735.28 67...
10    46 MULTIPOLYGON (((600317.4 69...
> 
> sapply(muns1, class)
$kunta
[1] "integer"

$geom
[1] "sfc_MULTIPOLYGON" "sfc"

codes_as_character is TRUE

> muns2 <- geofi::get_municipalities(year = 2022, codes_as_character = TRUE) %>% 
+   dplyr::select(kunta)
Requesting response from: http://geo.stat.fi/geoserver/wfs?service=WFS&version=1.0.0&request=getFeature&typename=tilastointialueet%3Akunta4500k_2022
Data is licensed under: Attribution 4.0 International (CC BY 4.0)
Warning message:
Coercing CRS to epsg:3067 (ETRS89 / TM35FIN) 
> 
> muns2
Simple feature collection with 309 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 83747.59 ymin: 6637032 xmax: 732907.7 ymax: 7776431
Projected CRS: ETRS89 / TM35FIN(E,N)
First 10 features:
   kunta                           geom
1      5 MULTIPOLYGON (((366787.9 70...
2      9 MULTIPOLYGON (((382543.4 71...
3     10 MULTIPOLYGON (((343298.2 69...
4     16 MULTIPOLYGON (((436139.7 67...
5     18 MULTIPOLYGON (((426631 6720...
6     19 MULTIPOLYGON (((263938.3 67...
7     20 MULTIPOLYGON (((328844.1 67...
8     35 MULTIPOLYGON (((176190.4 67...
9     43 MULTIPOLYGON (((92735.28 67...
10    46 MULTIPOLYGON (((600317.4 69...
> 
> sapply(muns2, class)
$kunta
[1] "integer"

$geom
[1] "sfc_MULTIPOLYGON" "sfc"

Changing the argument value does not introduce leading zeroes to the column kunta and it does not change the column type to character.

My session:

> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=Finnish_Finland.utf8  LC_CTYPE=Finnish_Finland.utf8    LC_MONETARY=Finnish_Finland.utf8 LC_NUMERIC=C                    
[5] LC_TIME=Finnish_Finland.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] geofi_1.0.9    stringr_1.5.0  stringi_1.7.12 readxl_1.4.2   dplyr_1.1.2   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.10        cellranger_1.1.0   pillar_1.9.0       compiler_4.2.2     class_7.3-20       tools_4.2.2        odbc_1.3.4        
 [8] digest_0.6.31      bit_4.0.5          lifecycle_1.0.3    tibble_3.2.1       pkgconfig_2.0.3    rlang_1.1.0        DBI_1.1.3         
[15] cli_3.6.0          writexl_1.4.2      curl_5.0.0         yaml_2.3.7         e1071_1.7-12       withr_2.5.0        httr_1.4.5        
[22] xml2_1.3.3         generics_0.1.3     vctrs_0.6.2        hms_1.1.2          classInt_0.4-9     bit64_4.0.5        grid_4.2.2        
[29] tidyselect_1.2.0   glue_1.6.2         sf_1.0-12          R6_2.5.1           fansi_1.0.4        purrr_1.0.1        blob_1.2.3        
[36] magrittr_2.0.3     ellipsis_0.3.2     units_0.8-1        httpcache_1.2.0    utf8_1.2.2         KernSmooth_2.23-20 proxy_0.4-27

Additionally, what's peculiar is that the command geofi::get_municipalities(codes_as_character = FALSE) works without any specific year, but codes_as_character = TRUE requires an explicit year argument: geofi::get_municipalities(year = 2022, codes_as_character = TRUE). This is obviously a separate matter, will be opening an issue for it too if I find the time.

pitkant commented 1 year ago

I was the one fixing issue #38 with PR #39, so IIRC kunta column is used for some join operations and that's why it needs to be in integer format. What codes_as_character = TRUE affects are the different *_code columns, such as municipality_code in this case.

Maybe this could be fixed by either 1) coercing the municipality codes from the other source (MML?) from which data is joined to character format, or 2) hiding the kunta column somewhere else than as the 2nd column of the Simple feature collection.

What does @muuankarski think?

pitkant commented 1 year ago

Additionally, what's peculiar is that the command geofi::get_municipalities(codes_as_character = FALSE) works without any specific year, but codes_as_character = TRUE requires an explicit year argument: geofi::get_municipalities(year = 2022, codes_as_character = TRUE). This is obviously a separate matter, will be opening an issue for it too if I find the time.

This was actually because of missing sairaanhoitop_code column in the most recent year, 2023, that the function defaults to if year is not provided explicitly. I wrote a patch that is more robust for differences in column names between years.

sampoves commented 1 year ago

Thank you @pitkant for your fast reply. I have to admit that I was not aware of municipality_code due to RStudio playing tricks on me (not showing any columns past id 52 🤨).

municipality_code is already working the way I would expect in geofi_1.0.9. Nevertheless, many thanks for looking into this matter!

sampoves commented 1 year ago

Hello,

Unfortunately codes_as_character still does not work in geofi_10.0.10. I am sorry to bring this up again, and as always, thank you geofi maintainers!

> geofi::get_municipalities(codes_as_character = TRUE) %>% 
+   dplyr::select(kunta)

Requesting response from: http://geo.stat.fi/geoserver/wfs?service=WFS&version=1.0.0&request=getFeature&typename=tilastointialueet%3Akunta4500k_2023
Data is licensed under: Attribution 4.0 International (CC BY 4.0)
Simple feature collection with 309 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 83747.59 ymin: 6637032 xmax: 732907.7 ymax: 7776431
Projected CRS: ETRS89 / TM35FIN(E,N)
First 10 features:
   kunta                           geom
1      5 MULTIPOLYGON (((366787.9 70...
2      9 MULTIPOLYGON (((382543.4 71...
3     10 MULTIPOLYGON (((343298.2 69...
4     16 MULTIPOLYGON (((436139.7 67...
5     18 MULTIPOLYGON (((426631 6720...
6     19 MULTIPOLYGON (((263938.3 67...
7     20 MULTIPOLYGON (((328844.1 67...
8     35 MULTIPOLYGON (((176190.4 67...
9     43 MULTIPOLYGON (((92735.28 67...
10    46 MULTIPOLYGON (((600317.4 69...
Warning message:
Coercing CRS to epsg:3067 (ETRS89 / TM35FIN) 

Requesting response from: http://geo.stat.fi/geoserver/wfs?service=WFS&version=1.0.0&request=getFeature&typename=tilastointialueet%3Akunta4500k_2023
Data is licensed under: Attribution 4.0 International (CC BY 4.0)
Warning message:
Coercing CRS to epsg:3067 (ETRS89 / TM35FIN) 

> sapply(muns, class)
$kunta
[1] "integer"

$geom
[1] "sfc_MULTIPOLYGON" "sfc"

Current environment is this:

R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=Finnish_Finland.utf8  LC_CTYPE=Finnish_Finland.utf8    LC_MONETARY=Finnish_Finland.utf8
[4] LC_NUMERIC=C                     LC_TIME=Finnish_Finland.utf8    

time zone: Europe/Helsinki
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] geofi_1.0.10   sf_1.0-14      odbc_1.3.5     stringr_1.5.0  stringi_1.7.12 readxl_1.4.3  
[7] dplyr_1.1.2   

loaded via a namespace (and not attached):
 [1] bit_4.0.5          compiler_4.3.1     tidyselect_1.2.0   Rcpp_1.0.11        xml2_1.3.5        
 [6] blob_1.2.4         yaml_2.3.7         R6_2.5.1           generics_0.1.3     curl_5.0.2        
[11] classInt_0.4-10    tibble_3.2.1       units_0.8-3        DBI_1.1.3          pillar_1.9.0      
[16] rlang_1.1.1        utf8_1.2.3         bit64_4.0.5        cli_3.6.1          withr_2.5.0       
[21] magrittr_2.0.3     class_7.3-22       digest_0.6.33      grid_4.3.1         httpcache_1.2.0   
[26] rstudioapi_0.15.0  hms_1.1.3          lifecycle_1.0.3    vctrs_0.6.3        writexl_1.4.2     
[31] KernSmooth_2.23-21 proxy_0.4-27       glue_1.6.2         cellranger_1.1.0   fansi_1.0.4       
[36] e1071_1.7-13       purrr_1.0.2        httr_1.4.7         tools_4.3.1        pkgconfig_2.0.3  
muuankarski commented 1 year ago

Hi, I had to make a quick resubmission as geofi was archived from CRAN. Sid not have Time to properly test this. Will have a Look and can make a resubmissio soon.

Beat, markus

to 2. marrask. 2023 klo 12.15 Sampo Vesanen @.***> kirjoitti:

Hello,

Unfortunately codes_as_character still does not work in geofi_10.0.10.

`> geofi::get_municipalities(codes_as_character = TRUE) %>%

  • dplyr::select(kunta) Requesting response from: http://geo.stat.fi/geoserver/wfs?service=WFS&version=1.0.0&request=getFeature&typename=tilastointialueet%3Akunta4500k_2023 Data is licensed under: Attribution 4.0 International (CC BY 4.0) Simple feature collection with 309 features and 1 field Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 83747.59 ymin: 6637032 xmax: 732907.7 ymax: 7776431 Projected CRS: ETRS89 / TM35FIN(E,N) First 10 features: kunta geom 1 5 MULTIPOLYGON (((366787.9 70... 2 9 MULTIPOLYGON (((382543.4 71... 3 10 MULTIPOLYGON (((343298.2 69... 4 16 MULTIPOLYGON (((436139.7 67... 5 18 MULTIPOLYGON (((426631 6720... 6 19 MULTIPOLYGON (((263938.3 67... 7 20 MULTIPOLYGON (((328844.1 67... 8 35 MULTIPOLYGON (((176190.4 67... 9 43 MULTIPOLYGON (((92735.28 67... 10 46 MULTIPOLYGON (((600317.4 69... Warning message: Coercing CRS to epsg:3067 (ETRS89 / TM35FIN)

muns <- geofi::get_municipalities(codes_as_character = TRUE) %>%

sapply(muns, class) $kunta [1] "integer"

$geom [1] "sfc_MULTIPOLYGON" "sfc" `

— Reply to this email directly, view it on GitHub https://github.com/rOpenGov/geofi/issues/46#issuecomment-1790441977, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALUL6CX55BXBJXCAM4W7NTYCNXDFAVCNFSM6AAAAAAZYLKNOCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJQGQ2DCOJXG4 . You are receiving this because you were mentioned.Message ID: @.***>

pitkant commented 1 year ago

@sampoves Yes, it would seem that what codes_as_characters actually does is change only certain fields that are meant to be characters into characters, but not all fields. See PR #39 for a list of which codes are width 3 characters, which are width 2 characters and which are integers. From the commit history you can notice that I thought that kunta can also be a character but apparently it breaks some join operations somewhere in the package or in package vignettes. Therefore it was changed back to be an integer in all cases. Maybe @muuankarski can shed light on this.

You could use municipality_code in join operations instead of kunta (although make sure that codes_as_characters = TRUE, if codes_as_characters = FALSE then municipality_code will also be integers)

Also argument documentation could be clarified, or then it could be made so that all codes would indeed be characters?

sampoves commented 1 year ago

Hello @muuankarski and @pitkant, many thanks for your replies.

We've been through this and I do have to admit that I just didn't remember what we've been conversing about. Apologies and thank you for the friendly reminder about the fields and as you said municipality_code is in fact characters when codes_as_characters = TRUE.

I do think it is tiny bit confusing to have a prominent field kunta which does not change in any circumstance, and the fields that are in fact affected are buried well into the wide output dataframe. I think argument clarification in the documentation would be a helpful step forward in the matter.

Many thanks for the swift communication and maintenance of geofi.