hasadna / Open-Knesset

A project aimed at making the Israeli Knesset more transparent. Python and Django based
http://oknesset.org/
BSD 3-Clause "New" or "Revised" License
106 stars 175 forks source link

Better exception handling in knesset plenum parsing and antiword #778

Open alonisser opened 7 years ago

alonisser commented 7 years ago
  1. Fix strange retry mechanism and log only once in session for specific exception
  2. Investigate this:

antiword failure with file: /oknesset_data/oknesset/Open-Knesset/data/plenum_protocols/2015_5_4_20_ptm_305541.doc

plenum/management/commands/parse_plenum_protocols_subcommands/download.py in _antiword

def _antiword(filename): try: return antiword(filename, logger) except: logger.exception(u'antiword failure with file: %s' % filename) Local variables simple/management/utils.py in antiword logger = local_logger cmd = 'antiword -x db ' + filename + ' > ' + filename + '.awdb.xml' logger.debug(cmd) output = subprocess.check_output(cmd, stderr=subprocess.STDOUT, shell=True) logger.debug(output) with open(filename + '.awdb.xml', 'r') as f:

OriHoch commented 7 years ago

hasadna/knesset-data#128 handles moving the plenum scraping to knesset-data

after it's done, should re-review this bug and see if it reproduces on the new architecture or not