Closed sandervh14 closed 1 month ago
It's not perfect, but none of the files are currently failing. Are there any specific problems that we should address in scope of this issue?
I created this issue when the HTML extractor didn't exist yet. As we both know, that one works better, due to less PDF processing artefacts.
I'll have a look in the coming days if there are still unexpected results.
Note to self: check the following:
WARNING:root:vote count (12) does not match voters ['Bury Katleen', 'Creyelman Steven', 'De Spiegeleer Pieter', 'Depoortere Ortwin', 'Dewulf Nathalie', 'Dillen Marijke', 'Gilissen Erik', 'Pas Barbara', 'Ponthier Annick', 'Ravyts Kurt', 'Samyn Ellen', 'Sneppe Dominiek', 'Troosters Frank', 'Van Grieken Tom', 'Van Langenhove Dries', 'Van Lommel Reccino', 'Vermeersch Wouter', 'Verreyt Hans']
logging.warning("Failed to process %s")
logged?I'll close this issue. We're have identified and will still identify reports that haven't been processed the way we expected them to be, and turned that (or will turn them) into additional unit tests. See test_extraction.py. So, ongoing, but we don't need a separate issue for this anymore. We're on it as part of our other work.
Find out why some are skipped / not properly processed.