Uploading 8 rebuilt bz2files to canonical-rebuilt-testing
Processing batch 9/11 [{'oecaen': [1912, 1943]}]% Completed | 22.3s
Processing year 1912
Retrieving issues...
Fleshing out articles by issue...
Number of partitions: 97
Skipped articles: []
done.
Processing year 1913
Retrieving issues...
Fleshing out articles by issue...
Number of partitions: 117
Skipped articles: []
done.
Processing year 1914
Retrieving issues...
Fleshing out articles by issue...
Number of partitions: 117
File "impresso_commons/text/rebuilder.py", line 703, in main
filter_language=languages
File "impresso_commons/text/rebuilder.py", line 541, in rebuild_issues
.pluck('id')\
File "/home/romanell/.pyenv/versions/impresso-pycommons/lib/python3.6/site-packages/dask/base.py", line 175, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/home/romanell/.pyenv/versions/impresso-pycommons/lib/python3.6/site-packages/dask/base.py", line 446, in compute
results = schedule(dsk, keys, **kwargs)
File "/home/romanell/.pyenv/versions/impresso-pycommons/lib/python3.6/site-packages/distributed/client.py", line 2510, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
File "/home/romanell/.pyenv/versions/impresso-pycommons/lib/python3.6/site-packages/distributed/client.py", line 1812, in gather
asynchronous=asynchronous,
File "/home/romanell/.pyenv/versions/impresso-pycommons/lib/python3.6/site-packages/distributed/client.py", line 753, in sync
self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
File "/home/romanell/.pyenv/versions/impresso-pycommons/lib/python3.6/site-packages/distributed/utils.py", line 337, in sync
six.reraise(*error[0])
File "/home/romanell/.pyenv/versions/impresso-pycommons/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/home/romanell/.pyenv/versions/impresso-pycommons/lib/python3.6/site-packages/distributed/utils.py", line 322, in f
result[0] = yield future
File "/home/romanell/.pyenv/versions/impresso-pycommons/lib/python3.6/site-packages/tornado/gen.py", line 735, in run
value = future.result()
File "/home/romanell/.pyenv/versions/impresso-pycommons/lib/python3.6/site-packages/distributed/client.py", line 1668, in _gather
six.reraise(type(exception), exception, traceback)
File "/home/romanell/.pyenv/versions/impresso-pycommons/lib/python3.6/site-packages/six.py", line 692, in reraise
raise value.with_traceback(tb)
File "/home/romanell/.pyenv/versions/3.6.0/envs/impresso-pycommons/lib/python3.6/site-packages/impresso_commons/text/helpers.py", line 70, in read_issue_pages
for page in alternative_read_text(filename, IMPRESSO_STORAGEOPT)
File "/home/romanell/.pyenv/versions/3.6.0/envs/impresso-pycommons/lib/python3.6/site-packages/impresso_commons/utils/s3.py", line 443, in alternative_read_text
with s_open(s3_key, 'r', transport_params=transport_params) as infile:
File "/home/romanell/.pyenv/versions/3.6.0/envs/impresso-pycommons/lib/python3.6/site-packages/smart_open/smart_open_lib.py", line 348, in open
binary, filename = _open_binary_stream(uri, binary_mode, transport_params)
File "/home/romanell/.pyenv/versions/3.6.0/envs/impresso-pycommons/lib/python3.6/site-packages/smart_open/smart_open_lib.py", line 556, in _open_binary_stream
return _s3_open_uri(parsed_uri, mode, transport_params), filename
File "/home/romanell/.pyenv/versions/3.6.0/envs/impresso-pycommons/lib/python3.6/site-packages/smart_open/smart_open_lib.py", line 628, in _s3_open_uri
return smart_open_s3.open(parsed_uri.bucket_id, parsed_uri.key_id, mode, **kwargs)
File "/home/romanell/.pyenv/versions/3.6.0/envs/impresso-pycommons/lib/python3.6/site-packages/smart_open/s3.py", line 117, in open
resource_kwargs=resource_kwargs,
File "/home/romanell/.pyenv/versions/3.6.0/envs/impresso-pycommons/lib/python3.6/site-packages/smart_open/s3.py", line 345, in __init__
'or is forbidden for access' % (key, bucket)
'oecaen/pages/oecaen-1914/oecaen-1914-12-02-a-pages.jsonl.bz2' does not exist in the bucket 'original-canonical-staging', or is forbidden for access
Example:
oecaen-1914-12-02-a
from BNF data.Extent:
~18 issues of
oecaen
(as of 01-09-2020).Complete log