Closed kaplun closed 4 years ago
Sometimes the tarballs with the record contents have some hidden files with metadata from MacOS systems, that have the same extension as the files they store metadata for. The plotextractor tries to parse those too and fails as they are not actually images:
Traceback (most recent call last):
File "/opt/inspire/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/opt/inspire/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
File "/opt/inspire/lib/python2.7/site-packages/invenio_base/helpers.py", line 49, in decorated_func
result = f(*args, **kwargs)
File "/opt/inspire/lib/python2.7/site-packages/invenio_workflows/workers/worker_celery.py", line 49, in celery_run
return run_worker(workflow_name, data, **kwargs).uuid
File "/opt/inspire/lib/python2.7/site-packages/invenio_workflows/worker_engine.py", line 47, in run_worker
run_workflow(wfe=engine, data=objects, **kwargs)
File "/opt/inspire/lib/python2.7/site-packages/invenio_workflows/client.py", line 103, in run_workflow
raise exception_triggered
WorkflowError: WorkflowError(Error: CorruptImageError('Not a JPEG file: starts with 0x00 0x05 /afs/.../workflows/storage/8d2da946-64e0-11e6-8cee-02163e010841/1608.04885.tar.gz_files/pics/._distance_matrix.jpg @ error/jpeg.c/JPEGErrorHandler/297',)
Traceback (most recent call last):
File "/opt/inspire/lib/python2.7/site-packages/invenio_workflows/engine.py", line 429, in processing_factory
self.run_callbacks(callbacks, objects, obj)
File "/opt/inspire/lib/python2.7/site-packages/workflow/engine.py", line 422, in run_callbacks
self.execute_callback(f, obj)
File "/opt/inspire/lib/python2.7/site-packages/invenio_workflows/engine.py", line 512, in execute_callback
callback(obj, self)
File "/opt/inspire/src/inspire/inspirehep/modules/oaiharvester/tasks/arxiv.py", line 169, in arxiv_plot_extract
plots = process_tarball(tarball)
File "/opt/inspire/lib/python2.7/site-packages/plotextractor/api.py", line 77, in process_tarball
converted_image_mapping = convert_images(image_list)
File "/opt/inspire/lib/python2.7/site-packages/plotextractor/converter.py", line 160, in convert_images
convert_image(image_file, converted_image_file, image_format)
File "/opt/inspire/lib/python2.7/site-packages/plotextractor/converter.py", line 172, in convert_image
An example from Sentry: https://sentry.cern.ch/inspire-sentry/inspire-labs/group/821464/
Fixed in #17.
See: ahem. @david-caro ?