ho-tex / doi

The small doi package for linking to doi.org
8 stars 2 forks source link

DOI breaking and hyphenation #6

Open bgvoisin opened 7 months ago

bgvoisin commented 7 months ago

Just now I encountered a situation where a DOI, entered with \doi, was cut across lines and a hyphen inserted at the break, turning

10.1016/j.ijheatfluidflow.2008.07.001

into

10.1016/j.ijheatfluid- flow.2008.07.001

This is because the doi package uses \href from hyperref, which treats its second argument like normal text. Clearly adding a hyphen inside a URL is a bad idea.

The url package works differently, its does not add hyphens and instead defines possible break points. Heiko Oberdiek pointed out that the best of both worlds is obtained with \nolinkurl, to format (if I understood correctly) the second argument of \href using \url.

The attached example (to be typeset with LuaLaTeX for the above hyphenation to occur) compares the outputs of

\doi{<doi>}
\href{https://doi.org/<doi>}{https://doi.org/<doi>}
\url{https://doi.org/<doi>}
\href{https://doi.org/<doi>}{\nolinkurl{https://doi.org/<doi>}}
\href{https://doi.org/<doi>}{\nolinkurl{doi:<doi>}}

The last of these gives the same output as \doi, without the unwanted hyphen. The only difference is doi:, outside the link with \doi and inside it with \href+\nolinkurl (I prefer the latter).

The whole doi package was introduced to deal with DOIs containing non-sensical characters like

10.1175/1520-0469(1983)040<0396:SWALW>2.0.CO;2

At the time, \href couldn't deal with these characters. Now it seems it can, see the second example in the attached file.

Hence the two questions:

doi-hyphen-example.zip

u-fischer commented 7 months ago

At the time, \href couldn't deal with these characters. Now it seems it can, see the second example in the attached file.

well you get two different links in the PDF:

https://doi.org/10.1175/1520-0469\(1983\)040%3C0396:SWALW%3E2.0.CO;2
https://doi.org/10.1175/1520-0469\(1983\)040<0396:SWALW>2.0.CO;2

acrobat reader seems no to care and opens both, but acrobat pro refuses to follow the second. You can naturally use \href or \url to input the link, but both do not try to encode the url or change the url, or extend a prefix or a protocol, so whatever you give them, must be a complete, correct url. They only escape e.g. parentheses as needed inside a PDF.

bgvoisin commented 7 months ago

acrobat reader seems no to care and opens both, but acrobat pro refuses to follow the second.

I can confirm this on the Mac: TeXShop and Preview open the link fine, but Acrobat Pro and Skim don't (nor does a locally compiled mupdf-x11). I had always thought Acrobat Pro to be the laxest PDF viewer.

Redefining \toks0 as follows seems to achieve the desired result:

\def\@doi#1{% 
  \let\#\relax
  \let\_\relax
  \let\textless\relax 
  \let\textgreater\relax 
%  \edef\x{\toks0={{#1}}}% 
  \edef\#{\#}%
  \edef\_{_}%
  \edef\textless{<}%
  \edef\textgreater{>}%
  \edef\x{\toks0={{\noexpand\nolinkurl{#1}}}}% 
  \x
  \edef\#{\@percentchar23}%
  \edef\_{_}%
  \edef\textless{\@percentchar3C}% instead of {\string<} for Apple
  \edef\textgreater{\@percentchar3E}% instead of {\sting>} for Apple
  \edef\x{\toks2={\noexpand\href{\doiurl#1}}}% 
  \x
  \edef\x{\endgroup\doitext\the\toks2 \the\toks0}%
  \x
}

See the attached file.

It's very possible the above code makes no sense. I can't pretend to actually understand the \doi macro; I just tried many things for hours, until suddenly one seemed to work.

I couldn't find an example of DOI containing # to test.

doi-hyphen-example-v3.zip