Readme

Purpose

Collect soil quality data and aggregate up to NUTS2 level to join to Rosés-Wolf database on regional GDP. We do this both for NUTS2 and NUTS3 (and for Sweden and Spain separately).

Download data

Download data
	NUTS2 Europe	xls
	NUTS3 Europe	csv
	NUTS3 Spain	csv
	NUTS3 Sweden	csv

What does the data look like?

It looks like this:

id	nuts	region	country	suitability_class_mean	suitability_class_max	suitability_class_min
Extract of soil quality data
1	AT11	Burgenland	Austria	3.957746	10	2
2	AT12+AT13	Lower Austria	Austria	4.123167	8	2
3	AT21	Carinthia	Austria	6.762500	9	2
4	AT22	Styria	Austria	6.494774	9	2
5	AT31	Upper Austria	Austria	4.587379	9	2
6	AT32	Salzburg	Austria	7.786885	9	2
7	AT33	Tyrol	Austria	8.324074	9	5
8	AT34	Vorarlberg	Austria	7.837209	9	5
9	BE10+BE24+BE31	Brabant	Belgium	3.586207	6	2
10	BE21	Antwerp	Belgium	5.074074	8	4

Data

Data on wheat suitability from FAO-GAEZ 4, available at the following link.

The data viewer presents the data like so:

knitr::include_graphics("images/fao_interface.PNG")

The specific extract information is:

Field	Value
FAO-GAEZ 4 metadata
Theme	Suitability and Attainable Yield
Sub-theme	Suitability Class
Variable name	Crop suitability index in classes
Time period	1981-2010
Climate data source	CRUTS32
Crop	Wheat
Water supply	Rainfed
Input level	Low

It is in a raster format - where the earth is broken into tiles.

This is what it looks like on a map:

library(stars)

tif <- read_stars(here::here("data", "siLr_whe_class.tif"))

ggplot() +
  geom_stars(data = tif)

We want to aggregate this to the nuts2 levels used in the Rosés-Wolf database on regional GDP.

NUTS2 shapefile

The NUTS2 shapefiles are from Eurostat

We’re going to work in WGS84 for convenience (EPSG: 4326).

We now need to aggregate up the NUTS2 regions which are joined together in the Rosés-Wolf database on regional GDP.

They are:

NUTS2_regions	Aggregation
Aggregations of NUTS2 regions
AT12+AT13	AT123
DE71+DE72	DE712
DE91+DE92	DE912

The following chunk manipulates the shapefile to make it easier to work with in a WGS84 projection.

library(sf)

# map <- read_sf(here::here("maps", "regions_nuts2.shp")) %>% janitor::clean_names()

# map_tbl <- map %>% as_tibble()

# map_tbl %>%
#   ggplot(aes(geometry = geometry)) +
#   geom_sf() +
#   coord_sf()

## Making the map files smaller
# library(rmapshaper)
# map_simple <- ms_simplify(map, keep = 0.1,
#                                 keep_shapes = FALSE)
# 
# map_tbl_simple <- map_simple %>% as_tibble()
# 
# map_tbl_simple %>% write_rds(here::here("maps", "NUTS2_simple.rds"))

map_simple <- read_rds(here::here("maps", "NUTS2_simple.rds")) %>% 
  st_as_sf()

map_simple <- map_simple %>% 
  st_transform(crs = 4326)

We want to crop to only Europe: this is what the raster file looks like now.

# st_crop removes the raster region outside of europe
tif_bbox <- st_crop(tif, map_simple %>% st_bbox())

ggplot() +
  geom_stars(data = tif_bbox)

Now we want to calculate the averages within each polygon, along with the min and max values within each polygon.

# devtools::install_github("michaeldorman/geobgu")
library(geobgu)

map_simple_avgs <-
  map_simple %>% mutate(
    suitability_class_mean = raster_extract(tif_bbox, map_simple, fun = mean, na.rm = TRUE),
    suitability_class_max = raster_extract(tif_bbox, map_simple, fun = max, na.rm = TRUE),
    suitability_class_min = raster_extract(tif_bbox, map_simple, fun = min, na.rm = TRUE)
  )

This is what the averaging procedure produces:

The distribution of minimum and maximum values is quite erratic. The plot below arranges the NUTS2 regions from lowest mean of suitability class to largest. The ribbon shows the

Here is an example of 10 random regions.

nuts_code	suitability_class_mean	suitability_class_max	suitability_class_min
RO31	3.689165	9	2
NL41	5.515789	6	4
NL21	5.687500	7	5
PL51	4.417344	8	2
ITF2	6.318841	7	4
FR62	4.510094	9	2
NL33	5.532258	7	5
NO07	9.001366	10	7
EL24	6.030973	8	4
AN	8.500000	9	8

Join to Rosés-Wolf database on regional GDP

# read in data file
df <- haven::read_dta(here::here("data", "RosesWolf_Regional_Fahad.dta"))

# keep unique nuts codes
df <- df %>%
  distinct(id, nuts, region, country)

# change label of joined regions
map_simple_avgs <- map_simple_avgs %>%
  st_set_geometry(NULL) %>%
  mutate(nuts = case_when(
    nuts_code == "AT123" ~ "AT12+AT13",
    nuts_code == "DE712" ~ "DE71+DE72",
    nuts_code == "DE912" ~ "DE91+DE92",
    TRUE ~ nuts_code
  ))

# join together
df_out <- df %>%
  left_join(map_simple_avgs) %>%
  select(-nuts_code)

# df_out %>% 
#   write_rds(here::here("data", "soil_suitability.rds"))

# df_out %>% 
#   write.csv(here::here("data", "soil_suitability.csv"))

NUTS3 shapefile

The NUTS3 shapefiles are from Eurostat

We’re going to work in WGS84 for convenience (EPSG: 4326).

We now need to aggregate up the NUTS3 regions which are joined together in the Rosés-Wolf database on regional GDP for Sweden and Spain.

They are:

NUTS2_regions	Aggregation
Aggregations of NUTS2 regions
AT12+AT13	AT123
DE71+DE72	DE712
DE91+DE92	DE912

The following chunk manipulates the shapefile to make it easier to work with in a WGS84 projection.

library(sf)

map <- read_sf(here::here("maps", "NUTS_RG_01M_2021_4326.shp")) %>% janitor::clean_names()

map_tbl <- map %>% as_tibble()

map_tbl %>%
  ggplot(aes(geometry = geometry)) +
  geom_sf() +
  coord_sf()

# Making the map files smaller
library(rmapshaper)
map_simple <- ms_simplify(map, keep = 0.1,
                                keep_shapes = FALSE)

map_tbl_simple <- map_simple %>% as_tibble()

map_tbl_simple %>% write_rds(here::here("maps", "NUTS3_simple.rds"))

map_simple <- read_rds(here::here("maps", "NUTS3_simple.rds")) %>% 
  st_as_sf() %>% 
  filter(levl_code == 3)

map_simple <- map_simple %>% 
  st_transform(crs = 4326)

We want to crop to only Europe: this is what the raster file looks like now.

# st_crop removes the raster region outside of europe
tif_bbox <- st_crop(tif, map_simple %>% st_bbox())

ggplot() +
  geom_stars(data = tif_bbox)

Now we want to calculate the averages within each polygon, along with the min and max values within each polygon.

# devtools::install_github("michaeldorman/geobgu")
library(geobgu)

map_simple_avgs <-
  map_simple %>% mutate(
    suitability_class_mean = raster_extract(tif_bbox, map_simple, fun = mean, na.rm = TRUE),
    suitability_class_max = raster_extract(tif_bbox, map_simple, fun = max, na.rm = TRUE),
    suitability_class_min = raster_extract(tif_bbox, map_simple, fun = min, na.rm = TRUE)
  )

map_simple_avgs <- map_simple %>% 
  bind_cols(map_simple_avgs$suitability_class_mean %>% as_tibble() %>% rename(suitability_class_mean = V1)) %>% 
  bind_cols(map_simple_avgs$suitability_class_max %>% as_tibble() %>% rename(suitability_class_max = V1)) %>%
  bind_cols(map_simple_avgs$suitability_class_min %>% as_tibble() %>% rename(suitability_class_min = V1))

This is what the averaging procedure produces:

The distribution of minimum and maximum values is quite erratic. The plot below arranges the NUTS2 regions from lowest mean of suitability class to largest. The ribbon shows the

Here is an example of 10 random regions.

nuts_id	levl_code	cntr_code	name_latn	nuts_name	mount_type	urbn_type	coast_type	fid	suitability_class_mean	suitability_class_max	suitability_class_min
DE735	3	DE	Schwalm-Eder-Kreis	Schwalm-Eder-Kreis	4	3	3	DE735	3.933333	5	2
EL307	3	EL	Peiraias, Nisoi	Πειραιάς, Νήσοι	2	1	1	EL307	6.454546	7	5
AT125	3	AT	Weinviertel	Weinviertel	4	3	3	AT125	2.500000	4	2
FRC14	3	FR	Yonne	Yonne	4	3	3	FRC14	3.515625	6	2
DE27A	3	DE	Lindau (Bodensee)	Lindau (Bodensee)	3	2	3	DE27A	3.857143	10	2
CZ052	3	CZ	Královéhradecký kraj	Královéhradecký kraj	4	2	3	CZ052	3.600000	7	2
ITC13	3	IT	Biella	Biella	3	2	3	ITC13	5.066667	8	3
FRJ15	3	FR	Pyrénées-Orientales	Pyrénées-Orientales	2	2	1	FRJ15	6.129032	9	3
DE934	3	DE	Lüchow-Dannenberg	Lüchow-Dannenberg	4	3	3	DE934	5.307692	6	4
DEB3C	3	DE	Bad Dürkheim	Bad Dürkheim	4	2	3	DEB3C	4.500000	6	2

map_simple_avgs %>%
  as_tibble() %>% 
  select(-geometry) %>% 
  janitor::clean_names() %>% 
  # filter(cntr_code %in% c("SE", "ES")) %>% 
  write_excel_csv2(here::here("data", "soil_suitability_nuts_3.csv"))

map_simple_avgs %>%
  as_tibble() %>% 
  select(-geometry) %>% 
  janitor::clean_names() %>% 
  filter(cntr_code %in% c("ES")) %>%
  write_excel_csv2(here::here("data", "soil_suitability_nuts_3_ES.csv"))

Swedish map

Apparently we want the older Swedish regions: of which there are 24. See the Issue 1 on Github.

library(histmaps)

se_map <- get_boundaries(1980, "county")

se_map <- se_map %>% 
  st_transform(crs = 4326)

se_map %>% 
  ggplot(aes(fill = name)) +
  geom_sf(alpha = .5)

We want to crop to only Sweden: this is what the raster file looks like now.

# st_crop removes the raster region outside of europe
tif_bbox <- st_crop(tif, se_map %>% st_bbox())

ggplot() +
  geom_stars(data = tif_bbox)

Now we want to calculate the averages within each polygon, along with the min and max values within each polygon.

# devtools::install_github("michaeldorman/geobgu")
library(geobgu)

map_simple_avgs_sweden <-
  se_map %>% mutate(
    suitability_class_mean = raster_extract(tif_bbox, se_map, fun = mean, na.rm = TRUE),
    suitability_class_max = raster_extract(tif_bbox, se_map, fun = max, na.rm = TRUE),
    suitability_class_min = raster_extract(tif_bbox, se_map, fun = min, na.rm = TRUE)
  )

map_simple_avgs_sweden <- se_map %>% 
  bind_cols(map_simple_avgs_sweden$suitability_class_mean %>% as_tibble() %>% rename(suitability_class_mean = V1)) %>% 
  bind_cols(map_simple_avgs_sweden$suitability_class_max %>% as_tibble() %>% rename(suitability_class_max = V1)) %>%
  bind_cols(map_simple_avgs_sweden$suitability_class_min %>% as_tibble() %>% rename(suitability_class_min = V1))

This is what the averaging procedure produces:

The distribution of minimum and maximum values is quite erratic. The plot below arranges the NUTS2 regions from lowest mean of suitability class to largest. The ribbon shows the

Here is an example of 10 random regions.

geom_id	ref_code	name	type	type_id	start	end	suitability_class_mean	suitability_class_max	suitability_class_min
12496	SE/110000000	Kristianstads	County	county	1971	9999	4.279070	6	2
12506	SE/130000000	Hallands	County	county	1974	9999	5.074380	6	5
12502	SE/120000000	Skåne	County	county	1971	9999	2.922330	5	2
12547	SE/010000000	Stockholms	County	county	1971	9999	5.401316	6	5
12540	SE/210000000	Gävleborgs	County	county	1864	9999	7.456612	10	6
12548	SE/230000000	Jämtlands	County	county	1974	9999	8.960954	10	6
12532	SE/180000000	Örebro	County	county	1974	9999	6.202830	10	4
12555	SE/250000000	Norrbottens	County	county	1869	9999	8.643200	10	6
12527	SE/170000000	Värmlands	County	county	1971	9999	6.400891	10	5
12537	SE/200000000	Dalarnas	County	county	0	9999	7.388579	10	6

Export Data

j-jayes / soil-suitability

readme

Readme

Purpose

Download data

What does the data look like?

Data

NUTS2 shapefile

Join to Rosés-Wolf database on regional GDP

NUTS3 shapefile

Swedish map