welfare-state-analytics / pyriksprot

Python API for reading riksdagens protokoll as part of Westac Jupyter Pipeline
MIT License
2 stars 0 forks source link

Text corpus extraction fails #36

Closed roger-mahler closed 1 year ago

roger-mahler commented 2 years ago

Problem derives from breaking changes in common code for plain text and tagged text extraction.

Shared logic contains tagged-corpus specific code (assumes source folder contains a tagged corpus).

Resolved by creating TaggedCorpusSourceItem class derived from CorpusSourceItem.

https://github.com/welfare-state-analytics/pyriksprot/blob/bbb5d9b45a021bbb00eb3d91bbed99e17ef585ae/pyriksprot/corpus/corpus_index.py#L50