Closed fredrik1984 closed 4 years ago
A zipped file with PoS annotations and words' base form can be found in the project's Drive folder DATA/riksdagens_protokoll
. The data has been generated using Språkbanken Sparv v4 annotation package.
The source code for the annotation can be found in step.2-annotate_corpus which make use of the artifacts created in #87.
The generated files is in TSV format, and each word is in sequence on separate lines with the PoS code and the word's root form. The annotation is a very long-running tasks (several days in a best case scenario)..
PoS-tagga Riksdagsprotokollen 1920–2020 genom Språkbanken Sparv.