inspirehep / plotextractor

Extract images and captions from TeX files in a tar archive.
GNU General Public License v2.0
3 stars 9 forks source link

wrong rotation of plot from math in caption #22

Closed tsgit closed 4 years ago

tsgit commented 7 years ago

http://inspirehep.net/record/1614652/plots#0

the caption for the plot in the TeX source file has multiplicity ($\langle dN_{g}/dy\rangle=26$, upper

which I believe is matched by https://github.com/inspirehep/plotextractor/blob/master/plotextractor/converter.py#L203

203: degrees = re.findall('(angle=[-\\d]+|rotate=[-\\d]+)', line)

and leads to a 26 degree rotation of the plot

tsgit commented 7 years ago

corresponding ticket: https://rt.inspirehep.net/Ticket/Display.html?id=738896

also, I'm manually correcting the plot, and attach the rotated version here for the record. plots_v2_pt_times_lowhigh

michamos commented 7 years ago

It's not clear to me why this feature is present in the first place. Is it that common to rotate pictures to weird angles? I would assume the main purpose is to rotate the figure by 90/270 degrees to make a large plot fit in the page, which we wouldn't want to reproduce.

tsgit commented 7 years ago

I have also seen sideways plots that should be upright -- and are so in the paper. I would guess that it is mostly rotation in 90 degree increments -- but if we want to make a decision on supporting this feature or not we should get some data on what fraction of plots should/should not be rotated. It seems simply anchoring the regexp at a word boundary re.findall('\b(angle=[-\\d]+|rotate=[-\\d]+)' should take care of this case and cause no harm ?

michamos commented 7 years ago

it should work (if you escape as \\b).