inspirehep / plotextractor

Extract images and captions from TeX files in a tar archive.
GNU General Public License v2.0
3 stars 9 forks source link

paths can be unicode #21

Closed michamos closed 4 years ago

michamos commented 7 years ago

Directories and files are user generated, so might contain characters outside of ascii. This should be taken into account. See https://sentry.cern.ch/inspire-sentry/inspire-labs/group/824581/ which is due to tex_file = '/opt/inspire/var/data/workflows/files/6a/d2/e888-d822-40e3-a8cb-3982a2c8cc7c/data_files/carpeta sin título/three-methods.tex'.

david-caro commented 6 years ago

happens also with the image names in plotextractor/output_utils.py in get_image_location at line 227 for ex with image = u'轨道力学_p37.png'.

michamos commented 4 years ago

Fixed in #29.