welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Include curation in the corpus as additional metadata #185

Open MansMeg opened 2 years ago

MansMeg commented 2 years ago

We should wait for #162 to be finnished. Then we should start to add previously curated data as additional covariates. The purpose of strong the metadata on manually annotated curated data is twofold:

1) As a quality control. No data should be changed in a way that differs compared to manually curated (and checked) data.

2) To store training data from ML models for automated curation. In this way, we can extract "training" data from the corpus.

ninpnin commented 2 years ago

We'll do a manual check after v0.4.6 is done. This will result in manual annotations that will be stored in this format.