welfare-state-analytics / welfare_state_analytics

Welfare State Analytics
5 stars 0 forks source link

Riksdagsprotokoll 2 – PoS #77

Closed fredrik1984 closed 4 years ago

fredrik1984 commented 4 years ago

PoS-tagga Riksdagsprotokollen 1920–2020 genom Språkbanken Sparv.

roger-mahler commented 4 years ago

A zipped file with PoS annotations and words' base form can be found in the project's Drive folder DATA/riksdagens_protokoll. The data has been generated using Språkbanken Sparv v4 annotation package.

The source code for the annotation can be found in step.2-annotate_corpus which make use of the artifacts created in #87.

The generated files is in TSV format, and each word is in sequence on separate lines with the PoS code and the word's root form. The annotation is a very long-running tasks (several days in a best case scenario)..