In the pl data, there are some more than 5500 instances of the bigram "unresolued link" (funnily enough with "u"), this has enough effect to leak into the generation.
Original text
data/pl/Alanus_de_Insulis.De_VI_alis_cherubim.xml-562-<s>
data/pl/Alanus_de_Insulis.De_VI_alis_cherubim.xml:563:unresolued NAM <unknown>
data/pl/Alanus_de_Insulis.De_VI_alis_cherubim.xml-564-link VBE <unknown>
data/pl/Alanus_de_Insulis.De_VI_alis_cherubim.xml-565-Ad PRE ad
data/pl/Alanus_de_Insulis.De_VI_alis_cherubim.xml-566-explanationem SUB explanatio
data/pl/Alanus_de_Insulis.De_VI_alis_cherubim.xml-567-huius PRO hic3
data/pl/Alanus_de_Insulis.De_VI_alis_cherubim.xml-568-figure SUB figura
Generated
corruerunt, sed cum erunt rex Edom( unresolued link): Sumebis ergo intelligere, nisi Messacebriam<eos>
In the pl data, there are some more than 5500 instances of the bigram "unresolued link" (funnily enough with "u"), this has enough effect to leak into the generation.
Original text
Generated