impresso / impresso-pycommons

Python module with bits of code (objects, functions) highly reusable within impresso.
http://impresso-pycommons.rtfd.io/
GNU Affero General Public License v3.0
3 stars 3 forks source link

[rebuild] solr rebuild of luxwort fails #39

Closed mromanello closed 5 years ago

mromanello commented 5 years ago

stacktrace:

Processing batch 1/4 [{'luxwort': [1906, 1920]}]
Processing year 1906
Retrieving issues...
/scratch/matteo/impresso/solr-rebuilt/luxwort-1906
Fleshing out articles by issue...
Number of partitions: 108
Traceback (most recent call last):
  File "impresso_commons/text/rebuilder.py", line 623, in <module>
    main()
  File "impresso_commons/text/rebuilder.py", line 607, in main
    format=output_format
  File "impresso_commons/text/rebuilder.py", line 501, in rebuild_issues
    .to_textfiles('{}/*.json'.format(issue_dir))
  File "/home/romanell/.local/share/virtualenvs/impresso-pycommons-r-mvz05T/lib/python3.6/site-packages/dask/bag/core.py", line 720, in to_textfiles
    last_endline=last_endline, **kwargs)
  File "/home/romanell/.local/share/virtualenvs/impresso-pycommons-r-mvz05T/lib/python3.6/site-packages/dask/bag/core.py", line 213, in to_textfiles
    out.compute(**kwargs)
  File "/home/romanell/.local/share/virtualenvs/impresso-pycommons-r-mvz05T/lib/python3.6/site-packages/dask/base.py", line 156, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/romanell/.local/share/virtualenvs/impresso-pycommons-r-mvz05T/lib/python3.6/site-packages/dask/base.py", line 398, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/romanell/.local/share/virtualenvs/impresso-pycommons-r-mvz05T/lib/python3.6/site-packages/distributed/client.py", line 2545, in get
    results = self.gather(packed, asynchronous=asynchronous, direct=direct)
  File "/home/romanell/.local/share/virtualenvs/impresso-pycommons-r-mvz05T/lib/python3.6/site-packages/distributed/client.py", line 1800, in gather
    asynchronous=asynchronous,
  File "/home/romanell/.local/share/virtualenvs/impresso-pycommons-r-mvz05T/lib/python3.6/site-packages/distributed/client.py", line 739, in sync
    return sync(self.loop, func, *args, **kwargs)
  File "/home/romanell/.local/share/virtualenvs/impresso-pycommons-r-mvz05T/lib/python3.6/site-packages/distributed/utils.py", line 331, in sync
    six.reraise(*error[0])
  File "/home/romanell/.local/share/virtualenvs/impresso-pycommons-r-mvz05T/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/romanell/.local/share/virtualenvs/impresso-pycommons-r-mvz05T/lib/python3.6/site-packages/distributed/utils.py", line 316, in f
    result[0] = yield future
  File "/home/romanell/.local/share/virtualenvs/impresso-pycommons-r-mvz05T/lib/python3.6/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/romanell/.local/share/virtualenvs/impresso-pycommons-r-mvz05T/lib/python3.6/site-packages/tornado/gen.py", line 736, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/romanell/.local/share/virtualenvs/impresso-pycommons-r-mvz05T/lib/python3.6/site-packages/distributed/client.py", line 1631, in _gather
    six.reraise(type(exception), exception, traceback)
  File "/home/romanell/.local/share/virtualenvs/impresso-pycommons-r-mvz05T/lib/python3.6/site-packages/six.py", line 692, in reraise
    raise value.with_traceback(tb)
  File "/home/romanell/.local/share/virtualenvs/impresso-pycommons-r-mvz05T/lib/python3.6/site-packages/dask/bag/core.py", line 132, in _to_textfiles_chunk
    for d in data:
  File "/home/romanell/.local/share/virtualenvs/impresso-pycommons-r-mvz05T/lib/python3.6/site-packages/dask/bag/core.py", line 1750, in map_chunk
    for a in zip(*args):
  File "/home/romanell/.local/share/virtualenvs/impresso-pycommons-r-mvz05T/lib/python3.6/site-packages/dask/bag/core.py", line 1750, in map_chunk
    for a in zip(*args):
  File "/home/romanell/impresso_code/impresso-pycommons/impresso_commons/text/helpers.py", line 91, in rejoin_articles
    ][0]
IndexError: list index out of range