plk / biber

Backend processor for BibLaTeX
Artistic License 2.0
336 stars 37 forks source link

handling of already escaped í (as in García) seems broken #94

Closed joernhees closed 1 year ago

joernhees commented 8 years ago

There seems to be a problem with biber when it encounters an already escaped unicode compound char like in Garc{\'{\i}}a in the .bib file. This seems to generate a compound Garcı´a in the .bbl file, leading to an error like Package inputenc Error: Unicode char ́ (U+301) [\end]. If i instead use backend=bibtex (which i'd rather not), it works as expected...

biber version: 2.2, on a current mactex

Below is a minimal test-case:

Bibliography.lib:

@inproceedings{Mendes2011DBpediaSpotlight,
address = {Graz, Austria},
author = {Mendes, Pablo N. and Jakob, Max and Garc{\'{\i}}a-Silva, Andr{\'{e}}s and Bizer, Christian},
booktitle = {Proc. of the I-SEMANTICS},
isbn = {9781450306218},
keywords = {dbpedia,linked data,named entity,text annotation},
publisher = {ACM},
title = {{DBpedia Spotlight: Shedding Light on the Web of Documents}},
year = {2011}
}

foo.tex:

\documentclass{article}
\usepackage[utf8]{inputenc}
\PassOptionsToPackage{%
    backend=biber,%
    % backend=bibtex8,bibencoding=utf8,%
    maxbibnames=5, % default: 3
    }{biblatex}
    \usepackage{biblatex}
\addbibresource{Bibliography.bib}
\begin{document}
\cite{Mendes2011DBpediaSpotlight}
\printbibliography
\end{document}

Removing the \usepackage[utf8]{inputenc} makes the problem disappear. Alternatively switching to backend=bibtex,bibencoding=ascii or backend=bibtex8 as in the comment above also seems to work for this case, but i couldn't get it running with my full library.

Replacing all occurrences of {\'{\i}} with {\'{i}} in the .bib file it seems to work with biber. Sadly my library.bib is auto-generated by Mendeley so this doesn't seem like a viable option either :(

Any idea how this could be fixed?

plk commented 8 years ago

This is a common issue with utf8 and pdflatex. Usually you should use the biber option --output-safechars with utf8 and pdflatex since inputenc doesn't support a full set of unicode encoded chars. It's best to ask this on tex.stackexchange.com as you will get immediate help there. Also see the biber documentation - the issue of utf8 and pdftex etc. is covered in there.

moewew commented 8 years ago

On TeX.SX there is Input encoding error after upgrading from Biber 1.9 to Biber 2.1. There is also https://github.com/plk/biber/issues/65

joernhees commented 8 years ago

uhm, i'm just running this with latexmk -pdf foo.tex... maybe if this "breaks by default", it's a bad default?

thanks for the other references though.

mforbes commented 1 year ago

Joseph Wright describes the issue on stack exchange How to put an acute on an i using Biber: issues with "\'{\i}". Can we reopen this to fix the conversion to an appropriate single codepoint í so that people don't keep running into this issue and wasting hours trying to find the non-obvious and non-default fix of adding --output-safechars?

Perhaps at a minimum, the documentation could include an appropriate hint in the Unicode section, which currently offers only the following which, to my eyes, does not provide any useful information about these types of issues:

Biber uses NFD UTF-8 internally. All data is converted to NFD UTF-8 when read. If UTF-8 output is requested (to .bbl for example), the UTF-8 will always be NFC.

See also the following small collection of related issues on Stack Exchange (there are many more):

plk commented 1 year ago

Alright, I've given in and make this a special case in the 2.19 development version on SF. The "combining diacritic" output is perfectly valid UTF-8 both NFD and NFC but it should now be a single grapheme on output as it's seemingly a common issue.

mforbes commented 1 year ago

@plk Thanks! The --output-safechars workaround end up working, but because this lies at the intersection of a bunch of tools, it ends up being very tricky to figure out what is actually going wrong. Having the default work as expected will be helpful.