Open tsgit opened 7 years ago
another example:
$ tar tvf 1712.05415.tar
-rw-rw-r-- root/root 31304 2017-12-14 08:45 B2basic.pdf
-rw-rw-r-- root/root 51146 2017-12-14 08:45 B2chambers.pdf
-rw-rw-r-- root/root 38320 2017-12-14 08:45 B2unitarity.pdf
drwxrwxr-x root/root 0 2017-12-14 08:45 BGG%20for%20Lie(1)/
-rw-rw-r-- root/root 7929 2017-12-14 08:45 BGG%20for%20Lie(1)/main.bbl
-rw-rw-r-- root/root 51146 2017-12-14 08:45 BGG%20for%20Lie(1)/B2chambers.pdf
-rw-rw-r-- root/root 38320 2017-12-14 08:45 BGG%20for%20Lie(1)/B2unitarity.pdf
-rw-rw-r-- root/root 31304 2017-12-14 08:45 BGG%20for%20Lie(1)/B2basic.pdf
-rw-rw-r-- root/root 11618 2017-12-14 08:45 BGG%20for%20Lie(1)/jheppub.sty
-rw-rw-r-- root/root 19446 2017-12-14 08:45 BGG%20for%20Lie(1)/JHEP.bst
-rw-rw-r-- root/root 140166 2017-12-14 08:45 BGG%20for%20Lie(1)/main.tex
-rw-rw-r-- root/root 19446 2017-12-14 08:45 JHEP.bst
-rw-rw-r-- root/root 11618 2017-12-14 08:45 jheppub.sty
-rw-rw-r-- root/root 7929 2017-12-14 08:45 main.bbl
-rw-rw-r-- root/root 140166 2017-12-14 08:45 main.tex
leads to
2017-12-18 05:38:18 --> Stage 2 failed: ERROR: while elaborating FFT tags: fft '([('a', '/opt/cds-invenio/var/tmp/oaiharvest_96159_1_20171218040005_material/2017/12/arXiv:1712.05415/arXiv:1712.05415_plots/BGG%20for%20Lie(1)/B2chambers.png'), ('t', 'Plot'), ('d', '00000 The $B_2$ (shifted) Weyl chambers, their associated Weyl group element in terms of simple reflections $s_i$, their Bruhat order, the simple roots $\\alpha_i$ and the integral weight lattice. The red lines correspond to singular weights, and delimitate the shifted Weyl chambers. The intersections of gray lines correspond to integral weights.'), ('n', 'BGG%20for%20Lie(1)_B2chambers')], ' ', ' ', '', 23)' specifies in $a a location ('/opt/cds-invenio/var/tmp/oaiharvest_96159_1_20171218040005_material/2017/12/arXiv:1712.05415/arXiv:1712.05415_plots/BGG%20for%20Lie(1)/B2chambers.png') with problems: /opt/cds-invenio/var/tmp/oaiharvest_96159_1_20171218040005_material/2017/12/arXiv:1712.05415/arXiv:1712.05415_plots/BGG%20for%20Lie(1)/B2chambers.png is not a correct url: [Errno 2] No such file or directory: '/opt/cds-invenio/var/tmp/oaiharvest_96159_1_20171218040005_material/2017/12/arXiv:1712.05415/arXiv:1712.05415_plots/BGG for Lie(1)/B2chambers.png'
2017-12-18 05:38:18 --> <record>
<controlfield tag="001">1643671</controlfield>
<controlfield tag="005">20171218053818.0</controlfield>
<datafield tag="035" ind1=" " ind2=" ">
<subfield code="9">arXiv</subfield>
<subfield code="a">oai:arXiv.org:1712.05415</subfield>
</datafield>
I think the problem is with legacy
https://github.com/inspirehep/invenio/blob/prod/modules/bibdocfile/lib/bibdocfile.py#L3765-L3767
try:
if is_url_a_local_file(url):
path = urllib2.urlparse.urlsplit(urllib.unquote(url))[2]
why does a local file need urllib.unquote
this is part of check_valid_url(url)
called here:
https://github.com/inspirehep/invenio/blob/prod/modules/bibupload/lib/bibupload.py#L1838-L1843
if url:
url = url[0]
try:
check_valid_url(url)
except StandardError, e:
there is a paper on arXiv which has literal
%20
in a directory name, this causes issues when the%20
is converted to a space:/opt/cds-invenio/var/log/bibsched/102/bibsched_task_1028482.log
in this case the subdirectoy
Neutral%20Impurity-N-type%20(jinst)
contains a copy of all the files in the top level. This is just bad packaging by the authorhowever in general directory and filenames might contain sequences that should not be interpreted as urlescapes