u-fischer / newpax

12 stars 3 forks source link

Problem with parentheses in URLs #9

Closed vbeffara closed 2 years ago

vbeffara commented 2 years ago

I have a pdf (which I believe I cannot post here, but might send as pm if needed to reproduce the issue) generated by MS Word with links to URLs like https://doi.org/10.1016/s0169-5347(02)00045-9 with parentheses in them. Running newpax on it to generate a .pax file for use with pdfLaTeX, I get paragraphs like this:

\[{annot}{34}{Link}{231.42 550.85 444.01 563.67}{URI}{
  BS={<</W 0>>},
  URI={(https://doi.org/10.1016/s0169-5347\(02\)00045-9)},
}\\

with a strange way of escaping the parentheses in the URI, and then reading the file with pax.sty fails with the error:

Runaway argument?
{\PAX@call {annot}{Link}{}}{390.26 470.7 525.3 483.51}{URI}{BS={<</W \ETC.
! File ended while scanning use of \PAX@stop.

Removing just the paragraphs in the .pax file containing such parentheses gives a correct compilation with working links.

vbeffara commented 2 years ago

Trying to make a MWE now, can I do anything more to help?

vbeffara commented 2 years ago

Here is a MWE (which reproduces the issue when compiled with pdfLaTeX on an uptodate TeXlive 2021 on Mac OS):

File bug1.tex:

\documentclass{article}
\usepackage{hyperref,url}
\begin{document}
    \url{http://example.com/bla(bli)blo.html}
\end{document}

Generated bug1.pax:

\[{pax}{0.1l}\\
\[{file}{(./bug1.pdf)}{
  Size={21008},
  Date={D:20211119104226+01'00'},
}\\
\[{pagenum}{1}\\
\[{page}{1}{0.0 0.0 612.0 792.0}{}\\
\[{annot}{1}{Link}{147.716 654.025 332.77 665.15}{URI}{
  C={[0 1 1 ]},
  H={/I},
  Border={[0 0 1 ]},
  URI={(http://example.com/bla\(bli\)blo.html)},
}\\

File bug2.tex:

\documentclass{article}
\usepackage{pdfpages,pax}
\begin{document}
    \includepdf{bug1}
\end{document}
vbeffara commented 2 years ago

The error generated by the MWE is actually a bit different than the one I quote above:

! Undefined control sequence.
\GenericError  ...
                                                    #4  \errhelp \@err@     ...
l.13 }
      \\
u-fischer commented 2 years ago

Escaping parentheses like this is allowed in pdf. It is an error that this is then expanded as the LaTeX command \( when reincluding the annotation.

I fixed it for newpax and will make a update. But I don't maintain pax and can't (and don't want to either) fix it, so if you want to benefit from it you will have to use newpax.

vbeffara commented 2 years ago

Fair enough :-)

How about just replacing \(\) with %28%29 in the generated .pax file? I take it from your comment that what newpax does is just re-exporting what is in the .pdf file as-is, so it might be a significant change ...

On the other hand using newpax requires a much more recent system than most users will have so adopting it is unfortunately not really an option for me.

u-fischer commented 2 years ago

Sorry no. I will not spent time to implement workarounds for pax (which has a few more bugs I corrected in newpax). One goal of packages like newpax is to demonstrate the use of the new code for pdfmanagement and tagging we are writing. So users who want to benefit from the effort I put into this will have to make an effort too and update. Or fork the project and adapt it to your own needs.

vbeffara commented 2 years ago

Don't be sorry, it is not your job to maintain pax. Thanks for your work on newpax!

I am willing to update for myself, but not yet for our production system, so in the end I am just doing this on the .pax created by newpax:

sed -e 's/\\(/%28/g' -e 's/\\)/%29/g' -e 's/%/\\%/g' -i .bak "$fn.pax"

(because pax is also troubled by the presence of % signs in URLs). It works for me so far, and I am posting it here just in case it is useful to someone landing on this page following a search for the same bug, but obviously it is far from being a good solution. Thanks again!

CSchoel commented 2 years ago

Unfortunately the fix in 4746c9a4ed37ac7537d8f87af4ee6ced11667a66 did not work for me. I suggested a way to fix this in the lua code in #12 .

Btw.: Big thanks for newpax. It will make the electronic version of my cumulative PhD thesis much more user friendly. :smile:

u-fischer commented 2 years ago

A new version has been uploaded.