inspirehep / plotextractor

Extract images and captions from TeX files in a tar archive.
GNU General Public License v2.0
3 stars 9 forks source link

plotextractor misses plot files in an eprint #24

Open kaplun opened 6 years ago

kaplun commented 6 years ago

@hoc3426 commented on Wed Aug 03 2016

downloading http://export.arxiv.org/e-print/1404.6988 to /opt/cds-invenio/var/tmp-shared/2014/04/arXiv:1404.6988/arXiv:1404.6988 converted 2 of 2 images found for arXiv:1404.6988 No plots detected in 1292765

but there are two figures in the paper. @tsgit


@tsgit commented on Thu Aug 04 2016

https://github.com/inspirehep/invenio/blob/prod/modules/miscutil/lib/plotextractor.py#L283-L295

shows that this means

extract_captions https://github.com/inspirehep/invenio/blob/prod/modules/miscutil/lib/plotextractor.py#L553

fails. Why? Well, this joker doesn't use figure environment nor \caption command, instead the TeX source has

 \begin{center}
            \includegraphics[scale=0.4]{FIG2.png} \\
            {\small{} FIG. 2  The black solid line is the prediction of $\phi^2$ chaotic inflation. The blue dotted line is the prediction of the cosine model. For  given values of $r$ and $n_s$, we can compare the 2 models with the same $\phi=\phi_0$ value (the red dashed line), or with the same $n_s$ value (the red solid line). In the latter case, $\phi_0$ in the cosine model now corresponds to $\phi_n$ in the chaotic inflation model.  The $16\Delta$ shown in the figure is about the maximum value one can get within the cosine model.}
 \end{center}

I'm tempted to tell author to use proper figure commands. @hoc3426

T


@tsgit commented on Thu Aug 04 2016

plotextractor makes some reasonable assumptions about figures and captions in TeX source files

TeX sources which don't match these assumptions will not have (all of the) figures attached.

There's some gymnastics in plotextractor to second guess pathological users, but there are many cases it doesn't properly handle. Fixing plotextractor really requires a full rewrite.

This is not going to happen in legacy.


@hoc3426 commented on Thu Aug 04 2016

Yes, I think it's perfectly reasonable that we missed that one.