Scripts and code related to collecting and curating the UNESCO Courier corpus.
python courier/elements/export_tagged_issues.py
python courier/cli/tagged2article.py
Usage: tagged2article.py [OPTIONS] SOURCE TARGET_FOLDER [ARTICLE_INDEX]
Options:
--editorials / --no-editorials
--supplements / --no-supplements
--unindexed / --no-unindexed
python courier/scripts/corpus_report.py [OUTPUT_FOLDER]
python courier/scripts/extract_raw_corpora.py
find_double_pages.sh <dir>