By default don't return 7220 in read_habitatmap_terr()

florisvdh commented 4 years ago

@w-jan Do you follow this idea?

The habitatsprings data source, returned by read_habitatsprings(), contains much more accurate information about this partly aquatic, partly terrestrial type (see its version in the rc0.2 branch).

Therefore I suggest to provide an argument drop_7220 in read_habitatmap_terr() to drop occurrences of 7220 and set it TRUE by default.

If the user still wants to query the occurrence of 7220 in terrestrial polygons according to the habitatmap, (s)he can do so by setting the argument to FALSE. But the user will be advised to use read_habitatsprings(); the current version of the data source (a points object) also contains information about terrestrial vs. aquatic.
Moreover, uniquely-7220-polygons were already dropped from habitatmap_terr in the data source, because at that time it was considered as an aquatic type together with the other aquatic types.

I believe this would better fit the purpose of read_habitatmap_terr() to provide a clean and optimally interpreted layer.

w-jan commented 4 years ago

From a theoretically point of view, it's better to store the terrestrial forms in a polygon. In Flanders the current distribution can probably be better described by a point layer. So I can agree with a point layer.

A possible another possible approach is only to allocate the terrestrial forms to habitatmap_terr. Therefore the polygons in habitatmap_terr (considering they weren't dropped) can be checked and specifically labeled.

You can choose which approach to use.

florisvdh commented 4 years ago

Thanks for your ideas @w-jan.

Therefore the polygons in habitatmap_terr (considering they weren't dropped) can be checked and specifically labeled.

I'll check how this solution would look like (how large are those touched polygons compared to the surface area from read_habitatsprings()?). Anyhow, it would imply a specific update of the habitatmap_terr data source in an attempt to accommodate terrestrial 7220 sites as good as possible, while we already have a more reliable point layer for that. To avoid this hassle I was inclined to provide an option to drop 7220 from the result of read_habitatmap_terr(). Once 7220 from habitatsprings is properly included in the raw habitatmap data source, the story for habitatmap_terr will be different anyway.

Planning to turn back to this with more results.

florisvdh commented 4 years ago

Having explored this, I plan to go forward with the idea of dropping 7220 by default from the output of read_habitatmap_terr() and document that best information on this type is to be gained by read_habitatsprings(). It still remains optionally available.

@Patrikoosterlynck: FYI

Main reason is that 7220 is mainly present as tiny patches compared to habitatmap_terr polygon areas, hence the latter data are coarse for this type compared to habitatsprings. In the few cases where phab is > 0 in habitatmap_terr (only 5 from 40 cases), the calculated area of 7220 is still (much) too high compared to the information in habitatsprings.

Both data sources do not completely match for their occurrences of 7220, however last 3 row are potentially explained by the a priori absence of certain polygons in habitatmap_terr (summary of spatially joined result):

# A tibble: 10 x 6
   type_hmt type_hs system_type certain_hmt certain_hs     n
   <fct>    <fct>   <fct>       <lgl>       <lgl>      <int>
 1 7220     7220    mire        TRUE        TRUE           9
 2 7220     7220    rivulet     TRUE        TRUE          19
 3 7220     7220    unknown     FALSE       FALSE          7
 4 7220     7220    unknown     TRUE        FALSE          3
 5 7220     7220    unknown     TRUE        TRUE           1
 6 7220     NA      NA          FALSE       NA             1
 7 7220     NA      NA          TRUE        NA             2
 8 NA       7220    mire        NA          TRUE           1
 9 NA       7220    rivulet     NA          TRUE           2
10 NA       7220    unknown     NA          FALSE         13

(_hmt = habitatmap_terr, _hs = habitatsprings)

More detailed output of surface area information

Relative drop in area (when considering `habitatsprings` compared to `habitatmap_terr`) for the 5 cases with `phab` > 0 (`area_hmt` is phab-corrected): ```r # A tibble: 5 x 7 description system_type certain_hmt certain_hs area_hmt area_hs area_rel 1 95% 9130_end; 5% 7220 unknown TRUE FALSE 2580. 0 -1 2 85% 9130_end; 10% 91E0_va; 5% 7220 unknown TRUE TRUE 456. 20 -0.956 3 85% 91E0_vc; 15% 7220 unknown TRUE FALSE 1461. 0 -1 4 70% gh; 25% 9130_end; 5% 7220,gh unknown FALSE FALSE 844. 0 -1 5 70% 9130_end; 25% 91E0_vc; 5% 7220 rivulet TRUE TRUE 1115. 130 -0.883 ``` Polygon area distribution for _all other cases_ (removing extreme large areas) shows relatively large involved polygon surfaces: ![image](https://user-images.githubusercontent.com/19164640/78342526-b8da9400-7599-11ea-9f22-3ce29645a758.png) Distribution of `area_hs` to polygon area ratio, for case `area_hs_unknown = FALSE` in above plot: ![image](https://user-images.githubusercontent.com/19164640/78342746-09ea8800-759a-11ea-8dbf-ff3ba4d4d799.png)

R code

```r library(n2khab) library(tidyverse) library(sf) library(units) hs <- read_habitatsprings(filter_hab = TRUE) %>% filter(type == "7220") %>% mutate(area_hs = area_m2 %>% set_units("m^2")) %>% select(type_hs = type, certain_hs = certain, area_hs, system_type) hmt <- read_habitatmap_terr() hmt$habitatmap_terr_types %>% filter(type == "7220") %>% mutate(phab_is_zero = phab == 0) %>% count(phab_is_zero) hmt_pol <- hmt$habitatmap_terr_polygons %>% left_join(hmt$habitatmap_terr_types %>% filter(type == "7220"), by = "polygon_id") %>% mutate(phab = ifelse(phab == 0, NA, phab), area_pol = st_area(.), area_hmt = st_area(.) * phab / 100, area_hmt = ifelse(is.na(area_hmt), 0, area_hmt)) %>% select(description, type_hmt = type, certain_hmt = certain, area_pol, area_hmt) hmt_hs <- hmt_pol %>% st_join(hs) %>% mutate(area_hs = ifelse(is.na(area_hs), 0, area_hs), area_rel = ifelse(area_hmt == 0 & area_hs > 0, 1, ifelse(area_hmt == 0 & area_hs == 0, 0, (area_hs - area_hmt) / area_hmt)) ) hmt_hs %>% st_drop_geometry %>% filter(!is.na(type_hmt) | !is.na(type_hs)) %>% count(type_hmt, type_hs, system_type, certain_hmt, certain_hs) hmt_hs %>% st_drop_geometry %>% filter((!is.na(type_hmt) | !is.na(type_hs)), area_rel < 0) %>% select(description, system_type, certain_hmt, certain_hs, area_hmt, area_hs, area_rel) hmt_hs %>% st_drop_geometry %>% filter((!is.na(type_hmt) | !is.na(type_hs)), area_hmt == 0) %>% mutate(area_hs_unknown = area_hs == 0, area_pol_m2 = area_pol %>% drop_units()) %>% ggplot(aes(x = area_pol_m2, colour = area_hs_unknown)) + geom_density() + xlim(0, 25e4) hmt_hs %>% st_drop_geometry %>% filter((!is.na(type_hmt) | !is.na(type_hs)), area_hmt == 0 & area_hs > 0) %>% mutate(area_hs = area_hs %>% set_units("m^2"), area_relpol = (area_hs / area_pol) %>% drop_units()) %>% ggplot(aes(x = area_relpol)) + geom_histogram(colour = "grey60", fill = "white", binwidth = 0.005) ```

sessioninfo::session_info()

```r ─ Session info ─────────────────────────────────────────────────────────────────────────────────────── setting value version R version 3.6.3 (2020-02-29) os Linux Mint 18.1 system x86_64, linux-gnu ui RStudio language (EN) collate nl_BE.UTF-8 ctype nl_BE.UTF-8 tz Europe/Brussels date 2020-04-03 ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────── package * version date lib source assertable 0.2.7 2019-09-21 [1] CRAN (R 3.6.1) assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0) backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1) bit 1.1-15.2 2020-02-10 [1] CRAN (R 3.6.2) bit64 0.9-7 2017-05-08 [1] CRAN (R 3.6.0) blob 1.2.1 2020-01-20 [1] CRAN (R 3.6.2) broom 0.5.5 2020-02-29 [1] CRAN (R 3.6.3) cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.0) class 7.3-16 2020-03-25 [4] CRAN (R 3.6.3) classInt 0.4-2 2019-10-17 [1] CRAN (R 3.6.1) cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.3) codetools 0.2-16 2018-12-24 [4] CRAN (R 3.6.0) colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0) crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0) crosstalk 1.1.0.1 2020-03-13 [1] CRAN (R 3.6.3) curl 4.3 2019-12-02 [1] CRAN (R 3.6.2) data.table 1.12.8 2019-12-09 [1] CRAN (R 3.6.2) DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.2) dbplyr 1.4.2.9000 2020-03-06 [1] Github (florisvdh/dbplyr@6bda105) digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.3) dplyr * 0.8.5 2020-03-07 [1] CRAN (R 3.6.3) drat 0.1.5 2019-03-28 [1] CRAN (R 3.6.0) e1071 1.7-3 2019-11-26 [1] CRAN (R 3.6.2) ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1) fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.2) farver 2.0.3 2020-01-16 [1] CRAN (R 3.6.2) forcats * 0.5.0 2020-03-01 [1] CRAN (R 3.6.3) fs 1.3.2 2020-03-05 [1] CRAN (R 3.6.3) generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0) geoaxe 0.1.0 2016-02-19 [1] CRAN (R 3.6.0) ggforce * 0.3.1 2019-08-20 [1] CRAN (R 3.6.1) ggplot2 * 3.3.0 2020-03-05 [1] CRAN (R 3.6.3) git2r 0.26.1 2019-06-29 [1] CRAN (R 3.6.0) git2rdata 0.2.1 2020-03-02 [1] CRAN (R 3.6.3) glue 1.3.2 2020-03-12 [1] CRAN (R 3.6.3) gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0) haven 2.2.0 2019-11-08 [1] CRAN (R 3.6.2) hms 0.5.3 2020-01-08 [1] CRAN (R 3.6.2) htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.1) htmlwidgets 1.5.1 2019-10-08 [1] CRAN (R 3.6.1) httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.1) inborutils 0.1.0.9072 2020-03-09 [1] Github (inbo/inborutils@8948fec) iterators 1.0.12 2019-07-26 [1] CRAN (R 3.6.1) jsonlite 1.6.1 2020-02-02 [1] CRAN (R 3.6.2) KernSmooth 2.23-16 2019-10-15 [4] CRAN (R 3.6.1) labeling 0.3 2014-08-23 [1] CRAN (R 3.6.0) lattice 0.20-40 2020-02-19 [4] CRAN (R 3.6.2) lazyeval 0.2.2 2019-05-08 [1] Github (hadley/lazyeval@c336765) leaflet 2.0.3 2019-11-16 [1] CRAN (R 3.6.2) lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.3) lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.0) magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0) MASS 7.3-51.5 2019-12-20 [4] CRAN (R 3.6.2) memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0) modelr 0.1.6 2020-02-22 [1] CRAN (R 3.6.3) munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0) n2khab * 0.1.2.9000 2020-04-02 [1] local nlme 3.1-145 2020-03-04 [4] CRAN (R 3.6.3) oai 0.3.0 2019-09-07 [1] CRAN (R 3.6.1) odbc 1.2.2 2020-01-10 [1] CRAN (R 3.6.2) packrat 0.5.0 2018-11-14 [1] CRAN (R 3.6.0) pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.2) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1) plyr 1.8.6 2020-03-03 [1] CRAN (R 3.6.3) polyclip 1.10-0 2019-03-14 [1] CRAN (R 3.6.0) purrr * 0.3.3 2019-10-18 [1] CRAN (R 3.6.1) R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.2) raster 3.0-12 2020-01-30 [1] CRAN (R 3.6.2) Rcpp 1.0.4 2020-03-17 [1] CRAN (R 3.6.3) readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.2) readxl 1.3.1 2019-03-13 [1] CRAN (R 3.6.0) reprex 0.3.0 2019-05-16 [1] CRAN (R 3.6.1) rgbif 2.2.0 2020-03-09 [1] CRAN (R 3.6.3) rgeos 0.5-2 2019-10-03 [1] CRAN (R 3.6.1) rlang 0.4.5 2020-03-01 [1] CRAN (R 3.6.3) rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0) RSQLite 2.2.0 2020-01-07 [1] CRAN (R 3.6.2) rstudioapi 0.11 2020-02-07 [1] CRAN (R 3.6.3) rvest 0.3.5 2019-11-08 [1] CRAN (R 3.6.2) scales 1.1.0 2019-11-18 [1] CRAN (R 3.6.2) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0) sf * 0.9-0 2020-03-24 [1] CRAN (R 3.6.3) sp 1.4-1 2020-02-28 [1] CRAN (R 3.6.3) stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.3) stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.6.0) tibble * 3.0.0 2020-03-30 [1] CRAN (R 3.6.3) tidyr * 1.0.2 2020-01-24 [1] CRAN (R 3.6.2) tidyselect 1.0.0 2020-01-27 [1] CRAN (R 3.6.2) tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 3.6.2) tweenr 1.0.1 2018-12-14 [1] CRAN (R 3.6.1) units * 0.6-6 2020-03-16 [1] CRAN (R 3.6.3) utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.0) vctrs 0.2.4 2020-03-10 [1] CRAN (R 3.6.3) whisker 0.4 2019-08-28 [1] CRAN (R 3.6.1) withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0) xml2 1.2.5 2020-03-11 [1] CRAN (R 3.6.3) yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.2) [1] /home/floris/lib/R/library [2] /usr/local/lib/R/site-library [3] /usr/lib/R/site-library [4] /usr/lib/R/library ```

florisvdh commented 4 years ago

the a priori absence of certain polygons in habitatmap_terr

Checks have been done for dropped polygons from habitatmap_stdized (i.e. strictly aquatic, including 7220). None of these have 7220. This means that habitatmap_terr still contains all 7220 occurrences of habitatmap_stdized and we don't have to warn users about 'still missing' 7220 occurrences (compared to habitatmap_stdized) when drop_7220 is set as FALSE.

R-code & output included below.

R code & output

```r library(dplyr) library(n2khab) hms <- read_habitatmap_stdized() hms_occ <- hms$habitatmap_types types <- read_types() %>% select(type, hydr_class) hms_occ %>% inner_join(types, by = "type") %>% mutate(aq_orig = (hydr_class == "HC3" | type == "7220")) %>% group_by(polygon_id) %>% mutate(pol_aq_orig = (n() == sum(aq_orig))) %>% filter(pol_aq_orig) %>% filter(type == "7220") #> A tibble: 0 x 8 # double check this result: hmt <- read_habitatmap_terr() hmt_occ <- hmt$habitatmap_terr_types hmt_occ %>% filter(type == "7220") %>% nrow == (hms_occ %>% filter(type == "7220") %>% nrow) #> [1] TRUE # this also means 7220 never occurs alone in a polygon; frequency of number of # types per polygons is as follows: hmt_occ %>% filter(type == "7220") %>% select(polygon_id) %>% semi_join(hmt_occ, ., by = "polygon_id") %>% count(polygon_id) %>% {table(.$n)} #> 2 3 4 #> 17 20 3 ```

inbo / n2khab

By default don't return 7220 in read_habitatmap_terr() #66