inspirehep / plotextractor

Extract images and captions from TeX files in a tar archive.
GNU General Public License v2.0
3 stars 9 forks source link

file path normalization is missing #16

Closed tsgit closed 6 years ago

tsgit commented 7 years ago

I see bibsched error in FFT from figures with // in their path

is not a correct url: /opt/cds-invenio/var/tmp/oaiharvest_96159_1_20170216040034_material/2017
/01/arXiv:1701.04453/arXiv:1701.04453_plots/figures//v12_particles_3.png is not a normalized path (would be /opt/cds-invenio/var/tmp
/oaiharvest_96159_1_20170216040034_material/2017/01/arXiv:1701.04453/arXiv:1701.04453_plots/figures/v12_particles_3.png

looking at the unpacked TeX source file:

$ pwd
/opt/cds-invenio/var/tmp/oaiharvest_96159_1_20170216040034_material/2017/01/arXiv:1701.04453/arXiv:1701.04453_plots
$ grep figures *tex
\includegraphics[width=\columnwidth]{figures//v12_particles_3.png}
\includegraphics[width=\columnwidth]{figures//v12_mbin_2.png}
\includegraphics[width=\columnwidth]{figures//xi_mbin_2.png}
\includegraphics[width=\columnwidth]{figures/sp_mbin_2.png}
\includegraphics[width=\columnwidth]{figures/s12_mbin_2.png}
\includegraphics[width=1.05\textwidth]{figures/allthree_sigma8_newlayout.png}
\includegraphics[width=\textwidth]{figures/codecs_allthre_lines_2.png}

the includegraphics command has argument figures//v12_particles_3.png which graphics package properly handles

so I gather somewhere plotextract should do

os.path.normpath() on the detected figure inclusions

full FFT message

2017-02-16 06:30:41 -->    Stage 2 failed: ERROR: while elaborating FFT tags: fft '([('a', '/opt/cds-invenio/var/tmp/oaiharvest_9615
9_1_20170216040034_material/2017/01/arXiv:1701.04453/arXiv:1701.04453_plots/figures//v12_particles_3.png'), ('t', 'Plot'), ('d', '00
000 The mean inward radial pairwise velocity, $v_{12}$, as a function of the physical separation $r$. This is measured using a repre
sentative sample of dark matter particles from our simulation suite. The dashed line is the Hubble velocity given by $v_{Hubble} = -
Hr$, where $H$ is the Hubble constant.'), ('n', 'figures__v12_particles_3')], ' ', ' ', '', 189)' specifies in $a a location ('/opt/
cds-invenio/var/tmp/oaiharvest_96159_1_20170216040034_material/2017/01/arXiv:1701.04453/arXiv:1701.04453_plots/figures//v12_particle
s_3.png') with problems: /opt/cds-invenio/var/tmp/oaiharvest_96159_1_20170216040034_material/2017/01/arXiv:1701.04453/arXiv:1701.044
53_plots/figures//v12_particles_3.png is not a correct url: /opt/cds-invenio/var/tmp/oaiharvest_96159_1_20170216040034_material/2017
/01/arXiv:1701.04453/arXiv:1701.04453_plots/figures//v12_particles_3.png is not a normalized path (would be /opt/cds-invenio/var/tmp
/oaiharvest_96159_1_20170216040034_material/2017/01/arXiv:1701.04453/arXiv:1701.04453_plots/figures/v12_particles_3.png).
kaplun commented 6 years ago

Actually I just tried it and the output was:

[{'captions': [u'In the top row: the ratio of the pairwise velocity, $v_{12}$, between the R-FLRW simulations and the main $\\Lambda$CDM run at fixed separation $r = 5$ Mpc/h for four different redshifts. In the second and third row: the same ratio calculated for $\\sigma_{||}$ and $\\sigma_{12}$ respectively. The error bars show the 1-$\\sigma$ uncertainty calculated using a bootstrap technique. The same ratios are also calculated for two additional $\\Lambda$CDM runs with the same $\\sigma_8$ value at redshift $z=0$ of each R-FLRW simulation, represented as colored bands enclosing the 1-$\\sigma$ region around each measurement.'],
  'label': u'fig:s8compar',
  'name': 'figures_allthree_sigma8_newlayout',
  'original_url': '/tmp/1701.04453_files/figures/allthree_sigma8_newlayout.png',
  'url': '/tmp/1701.04453_files/figures/allthree_sigma8_newlayout.png'},
 {'captions': [u'Clockwise from the top left: The mean inward radial pairwise velocity, $v_{12}$, the correlation function, $\\xi(r)$, the mean dispersion in the radial pairwise velocity, $\\sigma_{||}$ and the line of sight dispersion, $\\sigma_{12}$, all as a function of halo mass for two physical separations, $r = 1$ Mpc/h and $r = 5$ Mpc/h, for all the simulations in our suite. \\refresponsestart{}The mass is the average mass in the five mass bins in which we split the halo catalog. Pairs are restricted to halos within the same halo mass bin, as described in Section~\\ref{sec:dmhalosvel}.\\refresponseend{}~ In the bottom panels we plot the residuals with respect to the reference $\\Lambda$CDM simulation.'],
  'label': u'fig:fourplots',
  'name': 'figures_sp_mbin_2',
  'original_url': '/tmp/1701.04453_files/figures/sp_mbin_2.png',
  'url': '/tmp/1701.04453_files/figures/sp_mbin_2.png'},
 {'captions': [u'Clockwise from the top left: The mean inward radial pairwise velocity, $v_{12}$, the correlation function, $\\xi(r)$, the mean dispersion in the radial pairwise velocity, $\\sigma_{||}$ and the line of sight dispersion, $\\sigma_{12}$, all as a function of halo mass for two physical separations, $r = 1$ Mpc/h and $r = 5$ Mpc/h, for all the simulations in our suite. \\refresponsestart{}The mass is the average mass in the five mass bins in which we split the halo catalog. Pairs are restricted to halos within the same halo mass bin, as described in Section~\\ref{sec:dmhalosvel}.\\refresponseend{}~ In the bottom panels we plot the residuals with respect to the reference $\\Lambda$CDM simulation.'],
  'label': u'fig:fourplots',
  'name': 'figures_s12_mbin_2',
  'original_url': '/tmp/1701.04453_files/figures/s12_mbin_2.png',
  'url': '/tmp/1701.04453_files/figures/s12_mbin_2.png'},
 {'captions': [u'The ratio of the pairwise velocity, and the dispersions, $\\sigma_{||}$ and $\\sigma_{12}$ , between the CoDECS simulations and the main $\\Lambda$CDM run at fixed separation $r = 5$ Mpc/h at redshift $z=0$. The error bars show the 1-$\\sigma$ uncertainty calculated using a bootstrap technique. Each statistic is also calculated for three additional $\\Lambda$CDM runs with the same $\\sigma_8$ value at redshift $z=0$ as the CoDECS scenarios EXP002, EXP003 and EXP008e3, represented as colored bands enclosing the 1-$\\sigma$ region around each measurement.'],
  'label': u'fig:codecs',
  'name': 'figures_codecs_allthre_lines_2',
  'original_url': '/tmp/1701.04453_files/figures/codecs_allthre_lines_2.png',
  'url': '/tmp/1701.04453_files/figures/codecs_allthre_lines_2.png'}]

so the problematic-path figures have been skipped. This is not good.