welfare-state-analytics / riksdagen-corpus-old

Preprocess the proceedings of the Swedish parliament
https://welfare-state-analytics.github.io/riksdagen-corpus/riksdagen_corpus/
8 stars 3 forks source link

Extract subsets of the corpora in various formats #26

Closed ninpnin closed 3 years ago

ninpnin commented 3 years ago

An end user might want to extract subsets from the corpora. I suggest we provide the ability to restrict the scope by

It would be natural to provide different output formats for this functionality. Minimally, I'd suggest we provide parla-clarin, and a plaintext format.

MansMeg commented 3 years ago

I think this is good. Although maybe not for the beta 0.1?

ninpnin commented 3 years ago

Fair enough. Good to plan before implementing in any case.