New function read_soilmap()

florisvdh commented 5 years ago

Note that more commits have to be added before merging; this PR is to be able to have discussion.

florisvdh commented 4 years ago

A vignette has been added to demonstrate package & data setup (using download_zenodo()), sf object handling & plotting, with read_soilmap() as a case!

Thanks to @devosbr, for his comments on the read_soilmap() documentation and his suggestions for variable names.

Further, note that today the soilmap_simple data source has been published at Zenodo! It is a recommended data source for a reproducible & tidy analytical workflow.

Guess this PR is ready - if someone likes to test, this is more than welcome. Start with this:

Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS = "true")
library(remotes)
install_github("inbo/n2khab@read_soilmap",
               build_vignettes = TRUE,
               upgrade = TRUE)
vignette("v022_example", package = "n2khab")

The environment variable R_REMOTES_NO_ERRORS_FROM_WARNINGS is set here to solve potential errors from the remotes package: it turns installation warnings into errors by default (as was seen for n2khab here, and counterargued (but closed) in r-lib/remotes#403). Perhaps this should go into the README as well.

Let me know your findings!

florisvdh commented 4 years ago

Thanks @hansvancalster, I'll keep in mind your (fine) ideas to mention the use of knitr::purl() and the factor level translation. The latter holds both for the simple and the raw data source. This would be an option in read_soilmap(). I rather prefer to return extra columns (with the translated levels), rather than replacing factor levels, what do you think?

I'm curious whether other reviewers will also be as lucky with installation and with running the code :wink:

hansvancalster commented 4 years ago

I rather prefer to return extra columns (with the translated levels), rather than replacing factor levels, what do you think?

Yes replacing the factor levels is not the best option. But would adding extra columns inflate the size of the objects in the R environment too much (don't remember but I think it took quite some memory already)? A translation table can always be joined after filtering the soil map to only what is needed for the analysis/visualisation.

florisvdh commented 4 years ago

Good to remind about object sizes @hansvancalster, I actually don't know how large would be the consequence of extra columns (in essence it is one integer column per factor). It's indeed a good alternative to return a list of sf object + translation table. (That's how it goes for read_habitatstreams() with the source_text = TRUE argument.) I may just go for that.

Also the levels in the data source need to be checked against the list of @devosbr (here) in order to use the latter. I'm inclined to go for the Dutch labels only, for now, until English labels are made available as well.

hansvancalster commented 4 years ago

Maybe just check if adding an extra column increases the size a lot. Might be not too bad: the geometry column will determine most of the object size.

florisvdh commented 4 years ago

the geometry column will determine most of the object size.

Sure, right!

florisvdh commented 4 years ago

It appears that we did the extra column approach in read_watersurfaces() (argument extended = TRUE), but that's a smaller dataset anyway.

DriesAdriaens commented 4 years ago

I found that with filter(bsm_ge_coastalplain == TRUE) also polygons outside the "coastal plain" are selected. It seems like all polygons where bsm_region equals "Kunstmatige gronden" are selected as well, spread across Flanders. This is the case for both soilmap and soilmap_simple.

sm_simple_co <- sm_simple %>% 
  filter(bsm_ge_coastalplain == TRUE)
sm_simple_co %>% 
  mapview(map.types = "OpenStreetMap")

Is that correct?

florisvdh commented 4 years ago

I found that with filter(bsm_ge_coastalplain == TRUE) also polygons outside the "coastal plain" are selected

Well spotted @DriesAdriaens :+1: , this clearly isn't correct and we'll have to solve this. Will look into it.

florisvdh commented 4 years ago

Thanks @DriesAdriaens for your review. I'll look into the specific comments.

Just wondered if "coastal plain" is the correct translation of "polders"

Well, it originally was "polders" until I changed it to "coastal plain" :wink: in 8e08faa. It think this was based on terminology of @devosbr in his document in January, maybe he has also told me (don't remember); anyhow he recently reviewed and updated the documentation of read_soilmap() here so I expect it to be OK.

DriesAdriaens commented 4 years ago

I wasn't aware that the dunes (bsm_region == "Duinen") were included in bms_ge_coastalplain too. In that sense, "Polders" would indeed be a name that is too narrow to use for the area where the standardize_coastalplain procedure is applied. Maybe "coastal area" or "coastal region" could be a more appropriate -and less misleading- than "coastal plain". Just my humble opinion.

florisvdh commented 4 years ago

Type_class == "Zeepolders" from the raw data source has been used as condition to set bsm_ge_coastalplain as TRUE. Probably I did a misinterpretation here; the explanation of this field is: Type van de bodemclassificatie (‘zeepolders’ of ‘rest van Vlaanderen’), which may refer to type categories rather than areas.

Did this exploration (expand by clicking the arrow): (source code: soilmap_zeepolders1.R.zip)

Code and output

```r library(n2khab) library(tidyverse) library(sf) library(mapview) sm <- read_soilmap(use_processed = FALSE) sm2 <- sm %>% st_drop_geometry %>% # recreating the original type_class variable: mutate(type_class = ifelse(bsm_ge_coastalplain, "Zeepolders", "Rest van Vlaanderen") %>% factor) > # relation between type_class and region variables: > sm2 %>% + count(type_class, bsm_region, bsm_ge_region) # A tibble: 16 x 4 type_class bsm_region bsm_ge_region n 1 Rest van Vlaanderen Doel NA 2870 2 Rest van Vlaanderen Kempen NA 47248 3 Rest van Vlaanderen Kunstmatige gronden NA 1 4 Rest van Vlaanderen Leemstreek NA 16029 5 Rest van Vlaanderen Nieuwland Watervliet NA 1258 6 Rest van Vlaanderen Weidestreek NA 1980 7 Rest van Vlaanderen Zandleemstreek NA 83327 8 Rest van Vlaanderen Zandstreek NA 61995 9 Zeepolders Duinen d 780 10 Zeepolders Historische Polders van Oostende h 426 11 Zeepolders IJzerestuarium n 73 12 Zeepolders Kunstmatige gronden NA 45738 13 Zeepolders Middellandpolders m 3991 14 Zeepolders Moeren r 366 15 Zeepolders Oudlandpolders o 3959 16 Zeepolders Zwin z 509 Warning message: Factor `bsm_ge_region` contains implicit NA, consider using `forcats::fct_explicit_na` > > # just one region - "Kunstmatige gronden" - is present in both type_classes: > sm2 %>% + distinct(type_class, bsm_region, bsm_ge_region) %>% + count(bsm_region, bsm_ge_region) %>% + filter(n>1) # A tibble: 1 x 3 bsm_region bsm_ge_region n 1 Kunstmatige gronden NA 2 Warning message: Factor `bsm_ge_region` contains implicit NA, consider using `forcats::fct_explicit_na` > > # soiltypes do belong to one type_class only: > sm2 %>% + distinct(type_class, bsm_soiltype) %>% + count(bsm_soiltype) %>% + filter(n>1) %>% + nrow == 0 [1] TRUE > > # inspecting soiltypes of 'Kunstmatige gronden' shows that all but one are > # attached to 'Zeepolders': > sm2 %>% + count(type_class, bsm_region, bsm_ge_region, bsm_soiltype, bsm_soiltype_region) %>% + filter(bsm_region == "Kunstmatige gronden") %>% + print(n = Inf) # A tibble: 22 x 6 type_class bsm_region bsm_ge_region bsm_soiltype bsm_soiltype_region n 1 Rest van Vlaanderen Kunstmatige gronden NA wPdc wPdc-ZLS 1 2 Zeepolders Kunstmatige gronden NA OA OA-KUNST 98 3 Zeepolders Kunstmatige gronden NA OB OB-KUNST 30543 4 Zeepolders Kunstmatige gronden NA OC OC-KUNST 1853 5 Zeepolders Kunstmatige gronden NA OE OE-KUNST 1977 6 Zeepolders Kunstmatige gronden NA OE1 OE1-KUNST 5 7 Zeepolders Kunstmatige gronden NA OG1 OG1-KUNST 279 8 Zeepolders Kunstmatige gronden NA OG2 OG2-KUNST 85 9 Zeepolders Kunstmatige gronden NA OH OH-KUNST 15 10 Zeepolders Kunstmatige gronden NA OL OL-KUNST 4 11 Zeepolders Kunstmatige gronden NA ON ON-KUNST 2554 12 Zeepolders Kunstmatige gronden NA OO OO-KUNST 8 13 Zeepolders Kunstmatige gronden NA OO1 OO1-KUNST 8 14 Zeepolders Kunstmatige gronden NA OO2 OO2-KUNST 33 15 Zeepolders Kunstmatige gronden NA OO3 OO3-KUNST 14 16 Zeepolders Kunstmatige gronden NA OO4 OO4-KUNST 9 17 Zeepolders Kunstmatige gronden NA OT OT-KUNST 6161 18 Zeepolders Kunstmatige gronden NA OU1 OU1-KUNST 474 19 Zeepolders Kunstmatige gronden NA OU2 OU2-KUNST 1465 20 Zeepolders Kunstmatige gronden NA OU3 OU3-KUNST 1 21 Zeepolders Kunstmatige gronden NA OW OW-KUNST 7 22 Zeepolders Kunstmatige gronden NA OZ OZ-KUNST 145 Warning message: Factor `bsm_ge_region` contains implicit NA, consider using `forcats::fct_explicit_na` ``` ```r # as these types are connected to 'region' 'Kunstmatige gronden' # the spatial occurrence of this region drives the spatial occurrence of types # from 'Zeepolders', outside 'typical' coastal plain regions: sm %>% mutate(type_class = ifelse(bsm_ge_coastalplain, "Zeepolders", "Rest van Vlaanderen") %>% factor) %>% filter(bsm_region == "Kunstmatige gronden", type_class == "Zeepolders") %>% mapview(alpha = 0.5, map.types = "OpenStreetMap", alpha.region = 0.5) ``` ![Schermafdruk van 2020-04-02 18-13-32](https://user-images.githubusercontent.com/19164640/78273612-5b016a00-750f-11ea-8b84-e0138e68361d.png) ```r > # except for 'wPdc' the above listed soiltypes only occur in the region 'Kunstmatige gronden': > sm2 %>% + filter(bsm_region == "Kunstmatige gronden") %>% + distinct(bsm_soiltype) %>% + semi_join(sm2, ., by = "bsm_soiltype") %>% + count(type_class, bsm_region, bsm_ge_region, bsm_soiltype) %>% + print(n = Inf) # A tibble: 26 x 5 type_class bsm_region bsm_ge_region bsm_soiltype n 1 Rest van Vlaanderen Kempen NA wPdc 12 2 Rest van Vlaanderen Kunstmatige gronden NA wPdc 1 3 Rest van Vlaanderen Leemstreek NA wPdc 4 4 Rest van Vlaanderen Zandleemstreek NA wPdc 101 5 Rest van Vlaanderen Zandstreek NA wPdc 24 6 Zeepolders Kunstmatige gronden NA OA 98 7 Zeepolders Kunstmatige gronden NA OB 30543 8 Zeepolders Kunstmatige gronden NA OC 1853 9 Zeepolders Kunstmatige gronden NA OE 1977 10 Zeepolders Kunstmatige gronden NA OE1 5 11 Zeepolders Kunstmatige gronden NA OG1 279 12 Zeepolders Kunstmatige gronden NA OG2 85 13 Zeepolders Kunstmatige gronden NA OH 15 14 Zeepolders Kunstmatige gronden NA OL 4 15 Zeepolders Kunstmatige gronden NA ON 2554 16 Zeepolders Kunstmatige gronden NA OO 8 17 Zeepolders Kunstmatige gronden NA OO1 8 18 Zeepolders Kunstmatige gronden NA OO2 33 19 Zeepolders Kunstmatige gronden NA OO3 14 20 Zeepolders Kunstmatige gronden NA OO4 9 21 Zeepolders Kunstmatige gronden NA OT 6161 22 Zeepolders Kunstmatige gronden NA OU1 474 23 Zeepolders Kunstmatige gronden NA OU2 1465 24 Zeepolders Kunstmatige gronden NA OU3 1 25 Zeepolders Kunstmatige gronden NA OW 7 26 Zeepolders Kunstmatige gronden NA OZ 145 Warning message: Factor `bsm_ge_region` contains implicit NA, consider using `forcats::fct_explicit_na` ``` ```r # let's see whether filtering for 'proper' regions from bsm_ge_region makes more sense: sm %>% filter(!is.na(bsm_ge_region)) %>% mapview(alpha = 0, alpha.region = 0.5) ``` ![Schermafdruk van 2020-04-02 18-19-17](https://user-images.githubusercontent.com/19164640/78273536-3f965f00-750f-11ea-9351-c0438ead6058.png)

I'll discuss this, and the 'coastal plain' term with @devosbr and come back.

Anyhow, I find the value 'Zeepolders' for soiltypes occurring everywhere in Flanders quite confusing. I prefer not to keep this Type_class information from the raw datasource in the result in R but rather choose a proper approach for setting bsm_ge_coastalplain - which may be the last one, using bsm_ge_region.

Whether the spatial occurrence of Type_class == "Zeepolders" should be considered a problem in the raw data source at DOV or not, is another matter.

florisvdh commented 4 years ago

A nice addition would be to have the option to add _explan versions of the selection of variables in soilmap_simple, similar to those of the raw soilmap data source.

Using _explan variables to use as labels is the way to go indeed, it was all there already :satisfied:! No need for matching with separate explanatory lists as I was thinking yesterday. The number of unique values match in the raw data source, between c("bsm_mo_substr", bsm_mo_tex", bsm_mo_drain", bsm_mo_prof", bsm_mo_parentmat", bsm_mo_profvar") and their _explan counterparts (by all that I refer to the original variables behind it). That is why the factor levels were aligned when coding read_soilmap(use_processed = FALSE).

I'll check what's the effect on object size as discussed with @hansvancalster , to conclude how to best (optionally) return the _explan info in the result of read_soilmap(use_processed = TRUE).

As soilmap_simple will probably need an update (because of bsm_ge_coastalplain), the best way to go will be to store the _explan levels in an extra table within the GeoPackage, in order to easily extract it with the function.

@DriesAdriaens thanks for your most valuable review comments! :pray:

florisvdh commented 4 years ago

@DriesAdriaens follow-up of coastalplain affairs, after fruitful discussions today with @devosbr and further analysis:

terminology 'coastalplain': will be better defined in the function's documentation, but name will be kept. 'coastal region' would be interpreted more narrowly, 'polders' is too narrow as well; it is coastal estuaries + dunes + polders but that cannot be put short in variable's names. 'Coastal plain', just like 'Zeepolders', should just remind the soil expert of what is actually meant, which we can describe.
correcting the bsm_ge_coastalplain (logical) variable:
- as illustrated in previous comment, all 7 available bsm_ge_regions can be counted in (TRUE)
- bsm_region == "Kunstmatige gronden" & type_class == "Zeepolders" (same as is.na(bsm_ge_region) & type_class == "Zeepolders") has several subcases:
  - soiltypes for which a texture/drainage translation was made in soil_translation_coastalplain.tsv (used by the standardize_coastalplain argument of read_soilmap()) can be counted in (TRUE).
  - This is: c("OG2", "OO3", "OE1", "OO1", "OU2", "OZ", "OO", "OU1", "OG1", "OO2", "OU3", "OO4")
  - Apart from a very few exception polygons nearby the coastalplain area (primarily small OZ polygons), they are confined to the coastalplain area (the few aberrations are to be considered mistakes in the raw data source which we won't solve here).
  - soiltype OL, without such translation but with all its (4) polygons confined to the coastalplain area, can be counted in as well (TRUE).
  - two soiltypes without such translation - OW and OH - fall completely outside the coastalplain area: they're not counted in (FALSE)
  - the remaining 6 soiltypes without such translation - c("OB", "OE", "OT", "ON", "OC", "OA") - occur both within and outside the coastalplain area, hence they will receive bsm_ge_coastalplain <- NA

That should result in a more acceptable labelling of polygons that originally had type_class == "Zeepolders" in the raw data source.

Selected code and results (click arrow)

(extended version of source code: [soilmap_zeepolders2.R.zip](https://github.com/inbo/n2khab/files/4428668/soilmap_zeepolders2.R.zip)) ```r library(n2khab) library(tidyverse) library(sf) library(mapview) sm <- read_soilmap(use_processed = FALSE, standardize_coastalplain = TRUE) %>% # recreating the original type_class variable: mutate(type_class = ifelse(bsm_ge_coastalplain, "Zeepolders", "Rest van Vlaanderen") %>% factor) flanders <- read_admin_areas() ge_regions <- sm %>% filter(!is.na(bsm_ge_region)) %>% st_geometry() %>% st_union() # 'O*' soiltypes vs. mo_tex_translation and ge_regions sm %>% filter(bsm_region == "Kunstmatige gronden", type_class == "Zeepolders") %>% mutate(texture_translation = !is.na(bsm_mo_tex)) %>% ggplot() + geom_sf(data = flanders, colour = "grey90", fill = "white") + geom_sf(data = ge_regions, colour = NA, fill = "thistle1") + geom_sf(colour = NA, fill = "black") + facet_wrap(~texture_translation, ncol = 1, labeller = "label_both") + theme_bw() ``` ![image](https://user-images.githubusercontent.com/19164640/78382576-615b1900-75d7-11ea-9757-8ad168179ce3.png) ```r # 'O*' soiltypes with mo_tex_translation: sm %>% filter(bsm_region == "Kunstmatige gronden", type_class == "Zeepolders") %>% mutate(texture_translation = !is.na(bsm_mo_tex)) %>% filter(texture_translation) %>% mutate(bsm_soiltype = as.character(bsm_soiltype)) %>% mapview(zcol = "bsm_soiltype", color = "black", alpha = 0.3, alpha.region = 1) ``` the resulting html is added [here](https://github.com/inbo/n2khab/files/4428584/viewhtml55ae71b916c4.zip) (extract zip and open index.html) ```r # 'O*' soiltypes without mo_tex_translation: > sm %>% + st_drop_geometry %>% + filter(bsm_region == "Kunstmatige gronden", + type_class == "Zeepolders") %>% + mutate(texture_translation = !is.na(bsm_mo_tex)) %>% + filter(!texture_translation) %>% + count(bsm_soiltype) # A tibble: 9 x 2 bsm_soiltype n 1 OA 98 2 OB 30543 3 OC 1853 4 OE 1977 5 OH 15 6 OL 4 7 ON 2554 8 OT 6161 9 OW 7 sm %>% filter(bsm_region == "Kunstmatige gronden", type_class == "Zeepolders") %>% mutate(texture_translation = !is.na(bsm_mo_tex)) %>% filter(!texture_translation) %>% ggplot() + geom_sf(data = flanders, colour = "grey90", fill = "white") + geom_sf(data = ge_regions, colour = NA, fill = "thistle1") + geom_sf(colour = NA, fill = "black") + facet_wrap(~bsm_soiltype, labeller = "label_both") + theme_bw() ``` ![image](https://user-images.githubusercontent.com/19164640/78383566-e8f55780-75d8-11ea-8423-01bea5dd4775.png) ```r # comparing with ecodistricts: ecoregions <- read_ecoregions() ggplot() + geom_sf(data = ge_regions, colour = NA, fill = "thistle1") + geom_sf(data = ecoregions %>% filter(str_detect(district_name, "Kust")), colour = "black", fill = NA) + theme_bw() ``` ![image](https://user-images.githubusercontent.com/19164640/78382424-235df500-75d7-11ea-81fa-3cfd2f12b190.png)

We could go further by using a spatial definition - but that does not seem worth the trouble (it would not differ much from the above). Spatial definition would largely boil down to ecodistricts 'Kustduinendistrict' plus 'Kustpoldersdistrict', plus an upstream part of the Yser river valley... (see illustrations)

Rather, we wanted to return sensible information that is available in the soilmap, so an update for 'Zeepolders' was needed. What cannot be correctly derived from the soilmap, should be corrected in the raw data source in the first place.

Will generate an updated read_soilmap() and soilmap_simple, beginning of next week.

florisvdh commented 4 years ago

Further note concerning above 'coastal plain' terminology:
- Ameryckx et al (1995, p. 208) speak about 'Kustvlakte', not about 'Zeepolders'. This supports the 'coastal plain' term.
Summarizing reflections about the coastalplain soiltypes:
- note about overlap: while the raw data source links all O* soiltypes to 'Zeepolders' - which may be the historical origin of these classes indeed - Van Ranst & Sys (2000) separately list the occurring O* soiltypes within and outside the coastalplain area, which clearly shows the overlap between the used soilltypes.
- does it make sense to have a type_class (type classification) variable in the raw data source 'soilmap'? Apart from the historical reasons, probably not. If one's aim is to see which codes are used where (cf. Van Ranst & Sys 2000), (s)he should do so by joining the spatial contour of the coastalplain area. Apart from the O* soiltypes, it is always possible to find back the geomorphological regions, and codes used therein (other than O*), by filtering for non-missing values of bsm_ge_region.
- does it make sense to return this type_class (type classification) variable in R, when reading the raw data source 'soilmap'? Here I take a conservative view: the purpose of read_soilmap(use_processed = FALSE) is to return all potentially useful information from the raw data source in a streamlined way. Hence I prefer to keep the information as previously returned for as long as the type_class variable occurs in the raw soilmap data source. However, the variable name bsm_ge_coastalplain is better replaced by bsm_ge_typology, with definition:
  
  bsm_ge_typology: Logical. Does the soiltype code follow the geomorphological typology?
- does it make sense to return a variable bsm_ge_typology when reading soilmap_simple into R? Clearly not. Rather than trying a best possible approach to predict whether a polygon is inside or outside the coastalplain area - like in previous comment in an attempt to approach the original intent of bsm_ge_coastalplain - it seems more useful to add a variable bsm_converted, which is TRUE for all soiltypes with non-missing bsm_ge_region plus c("OG2", "OO3", "OE1", "OO1", "OU2", "OZ", "OO", "OU1", "OG1", "OO2", "OU3", "OO4"). I.e. a variable that says whether morphogenetic texture and drainage category are the result of a conversion from these soiltypes. This conversion to non-missing texture and drainage categories is largely confined to the 'coastal plain' area, while whithin the 'coastal plain' area still other O* codes occur. Without this information, one cannot easily trace back which soiltype records are original and which were converted post-hoc (conversions are an estimate, not observations). Hence it makes sense to make this distinction in soilmap_simple for analytical purposes. This variable will be defined as:
  
  bsm_converted: Logical. Were morphogenetic texture and drainage variables (bsm_mo_tex and bsm_mo_drain) derived from a conversion table? Value TRUE is largely confined to the 'coastal plain' area. Only returned if standardize_coastalplain = TRUE.

florisvdh commented 4 years ago

Action taken: the metadata of current version soilmap_simple_v1 at Zenodo (https://doi.org/10.5281/zenodo.3732904) has been corrected for the bsm_ge_coastalplain variable:

bsm_ge_coastalplain: boolean. Did the original soil type code follow the geomorphological typology? It is TRUE for all polygons inside the coastal plain area, and also for a few soil type codes (starting with letter O) with a wider distribution across Flanders (the latter belong to the soil types for which the typology could not be converted into a morphogenetic one).

florisvdh commented 4 years ago

Note to self: a bit code & output

Selected code and results

```r library(n2khab) library(tidyverse) library(sf) sm <- read_soilmap(use_processed = FALSE, standardize_coastalplain = TRUE) glimpse(sm) > sm %>% + st_drop_geometry() %>% + mutate(soiltype_changed = (as.character(bsm_soiltype) != + as.character(bsm_mo_soilunitype))) %>% + count(soiltype_changed, bsm_converted) # A tibble: 2 x 3 soiltype_changed bsm_converted n 1 FALSE FALSE 257922 2 TRUE TRUE 12628 > sm %>% + st_drop_geometry() %>% + count(bsm_ge_typology, bsm_converted) # A tibble: 3 x 3 bsm_ge_typology bsm_converted n 1 FALSE FALSE 214708 2 TRUE FALSE 43214 3 TRUE TRUE 12628 > sm %>% + st_drop_geometry() %>% + count(bsm_ge_region, bsm_ge_typology, bsm_converted) # A tibble: 11 x 4 bsm_ge_region bsm_ge_typology bsm_converted n 1 d TRUE TRUE 780 2 h TRUE TRUE 426 3 m TRUE TRUE 3991 4 n TRUE TRUE 73 5 o TRUE TRUE 3959 6 r TRUE TRUE 366 7 z TRUE FALSE 2 8 z TRUE TRUE 507 9 NA FALSE FALSE 214708 10 NA TRUE FALSE 43212 11 NA TRUE TRUE 2526 Warning message: Factor `bsm_ge_region` contains implicit NA, consider using `forcats::fct_explicit_na` > sm %>% + st_drop_geometry() %>% + filter(is.na(bsm_ge_region), bsm_ge_typology, bsm_converted) %>% + count(bsm_region, bsm_soiltype) # A tibble: 12 x 3 bsm_region bsm_soiltype n 1 Kunstmatige gronden OE1 5 2 Kunstmatige gronden OG1 279 3 Kunstmatige gronden OG2 85 4 Kunstmatige gronden OO 8 5 Kunstmatige gronden OO1 8 6 Kunstmatige gronden OO2 33 7 Kunstmatige gronden OO3 14 8 Kunstmatige gronden OO4 9 9 Kunstmatige gronden OU1 474 10 Kunstmatige gronden OU2 1465 11 Kunstmatige gronden OU3 1 12 Kunstmatige gronden OZ 145 # 2 polygons from 'Zwin' with type 'G' have not been converted: sm %>% st_drop_geometry() %>% filter(bsm_ge_region == "z", bsm_ge_typology, !bsm_converted) %>% count(bsm_region, bsm_soiltype) > sm %>% + st_drop_geometry() %>% + filter(is.na(bsm_ge_region), bsm_ge_typology, !bsm_converted) %>% + count(bsm_region, bsm_soiltype) # A tibble: 9 x 3 bsm_region bsm_soiltype n 1 Kunstmatige gronden OA 98 2 Kunstmatige gronden OB 30543 3 Kunstmatige gronden OC 1853 4 Kunstmatige gronden OE 1977 5 Kunstmatige gronden OH 15 6 Kunstmatige gronden OL 4 7 Kunstmatige gronden ON 2554 8 Kunstmatige gronden OT 6161 9 Kunstmatige gronden OW 7 flanders <- read_admin_areas() ge_regions <- sm %>% filter(!is.na(bsm_ge_region)) %>% st_geometry() %>% st_union() sm %>% filter(bsm_converted) %>% ggplot() + geom_sf(data = flanders, colour = "grey90", fill = "white") + geom_sf(data = ge_regions, colour = NA, fill = "thistle1") + geom_sf(colour = NA, fill = "black") + theme_bw() ``` ![image](https://user-images.githubusercontent.com/19164640/78570619-80e78100-7825-11ea-9101-1b2e0491466a.png)

florisvdh commented 4 years ago

Observations on object size:

library(n2khab)
library(tidyverse)
sm <-
    read_soilmap(use_processed = FALSE, standardize_coastalplain = TRUE)
> object.size(sm) %>% print(units = "MiB")
394.8 MiB
> sm %>% select(-matches(".+_mo_.+_explan")) %>% object.size() %>% print(units = "MiB")
388.6 MiB
> sms <- read_soilmap()
> object.size(sms) %>% print(units = "MiB")
363.7 MiB

So just staying with columns for _explan variables is best.

florisvdh commented 4 years ago

@hansvancalster considering some remaining points:

Add code in the vignette to show how to extract the R-code in case someone wants to try it without having to copy-paste

A slight alternative has been made for archival purposes (a04cb8c), but reverted immediately (0c242c9). Given that most chunks have eval=FALSE, this resulted in consistent outcommenting of most R code with #>, which is not convenient to run. Therefore, considered this addition as unpractical to the user. Eventually I added some code to copy the Rmd file, in e5bb79c.

when downloading the simple soil map the progress % and bytes were not showing correctly, but the download was done without problems

Was an upstream bug in the curl package (see inbo/inborutils#79), should be fixed in its master (you may want to test! - see jeroen/curl#219) but it is not yet released.

the code with ggplotly did not render in the preview of my RStudio (1.1.463)

Dropped ggplotly in 99b5592.

florisvdh commented 4 years ago

@DriesAdriaens @hansvancalster Your suggestion regarding:

A nice addition would be to have the option to add _explan versions of the selection of variables in soilmap_simple, similar to those of the raw soilmap data source.

... has been implemented, both in the new version of the soilmap_simple data source (https://doi.org/10.5281/zenodo.3747496) and by the explan argument of the function (see documentation).

In the data source, this is implemented as a non-spatial table with the category explanations. This virtually didn't increase file size, while storing as columns would have inflated it by approx. 30 MB.
read_soilmap(explan = TRUE) adds these as extra _explan variables (resulting in variables effectively identical to those obtained when reading the raw data source). Adding as columns hardly inflates object size in R because they are factors.
An example of read_soilmap(explan = TRUE) has been included in the vignette.
Note that the default is explan = FALSE. Consequently, the soilmap_simple object in R remains most 'simple' by default.

Adding read_soilmap() and vignette to release candidate 0.2.0 now.

inbo / n2khab

New function read_soilmap() #29