ukwa / webarchive-discovery

WARC and ARC indexing and discovery tools.
https://github.com/ukwa/webarchive-discovery/wiki
116 stars 25 forks source link

Changed LanguageAnalyser to langid #318

Closed lasztoth closed 3 months ago

lasztoth commented 3 months ago

Changed LanguageAnalyser to use langid:

https://github.com/carrotsearch/langid-java

It offers better language detection and is able to recognise more languages.