identity-H encoding German letters

I aim to extract this table: https://www.dropbox.com/s/pqkbmiq4ulr5gkz/Spielestatistik%202017.pdf?dl=0 Sorry for bothering you with this specific file, but since the issue may be with specific encodings I could not quickly come up with a more evident public reproducible example.

Running

library(tabulizer)
library(tidyverse)
setwd("...")
"Spielestatistik 2017.pdf" %>% tabulizer::extract_text() -> rawtxt

leads to issues with German Umlauten (ä, ö, ü) as well as the double s (ß).

The file seems to have an identity-H encoding, which, according to a google search, might be the culprit. I still submit an issue because

library(pdftools)
library(stringr)
library(dplyr)
library(tidyr)

setwd("...")
Spiele1516 <- pdf_text("Spielestatistik 2017.pdf")
S1516 <- read.delim(textConnection(Spiele1516), strip.white = T)

does work, suggesting there could be a way to also handle such cases in the approach of pdftools.

ropensci / tabulapdf

identity-H encoding German letters #74