Closed moewew closed 2 years ago
I had an idea about this I've not tried before - now implemented in DEV.
All of 'my' cases and the ADS export (cf. #297) work brilliantly!
I don' know if it's related, but https://github.com/plk/biblatex/issues/727 https://github.com/plk/biber/issues/216 seems to be an issue again.
\documentclass[british]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{babel}
\usepackage{csquotes}
\usepackage[style=authoryear, backend=biber]{biblatex}
\usepackage{filecontents}
\begin{filecontents}{\jobname.bib}
@online{fontconfig,
author ={{\texttt{freedesktop.org}}},
sortname = {freedesktop},
title = {Fontconfig},
subtitle = {A library for configuring and customizing font access},
date = {2016-06-15},
urldate={2017-03-18},
url = {https://www.freedesktop.org/wiki/Software/fontconfig/}
}
@online{wikipedia,
author = {{\WikipediA}},
sortlabel = {Wikipedia},
sortname = {Wikipedia},
title = {Lucida},
date = {2016-10-19},
urldate = {2017-04-03},
url = {https://en.wikipedia.org/wiki/Lucida},
}
@online{features,
author = {{\WikipediA}},
sortlabel = {Wikipedia},
sortname = {Wikipedia},
title = {List of typographic features},
date = {2017-02-21},
urldate = {2017-03-24},
url = {https://en.wikipedia.org/wiki/List_of_typographic_features},
}
\end{filecontents}
\def\WikipediA{wikipedia}
\addbibresource{\jobname.bib}
\begin{document}
\nocite{*}
\printbibliography
\end{document}
gives bad familyi
's
\name{author}{1}{}{%
{{un=0,uniquepart=base,hash=f54ec09db02860f10fd50e2ce18d24db}{%
family={{\texttt{freedesktop.org}}},
familyi={f\bibinitperiod}}}%
}
...
\name{author}{1}{}{%
{{un=0,uniquepart=base,hash=95387a9e1a6bf37286493c821a0b17da}{%
family={{\WikipediA}},
familyi={}\bibinitperiod}}}%
}
...
\name{author}{1}{}{%
{{un=0,uniquepart=base,hash=95387a9e1a6bf37286493c821a0b17da}{%
family={{\WikipediA}},
familyi={}\bibinitperiod}}}%
}
Please try 2.15 now - this was really something in the initials generation for edge cases.
The errors are gone!
I'm a bit concerned about the change from
\field{title}{Signs of W$\frac{o}{a}$nder}
to
\field{title}{Signs of W$\frac{o}a$nder}
but the versions are equivalent (even though the former is much nicer).
That is a bit ugly but only happens with single characters in braces, which should be equivalent. Multi-argument macros are a huge pain to deal with here but are rare enough.
I think one of the consequences of your recent changes is that you remove the curtly brackets around all single characters (tested with the current 2.15 dev). This is not a good idea, e.g.
@BOOK{xxx,
AUTHOR = {X, Y},
DATE = {2020},
TITLE = {Part {I}},
}
will be turned into
@BOOK{xxx,
AUTHOR = {X, Y},
DATE = {2020},
TITLE = {Part I},
}
by biber --quiet --tool --outfile test2.bib test.bib
. And therefore "I" may become "i" in some citations styles but the brackets should have prevented this in the first place.
This is a constant problem I'm afraid - the latex decoding in biber
is syntactical and it's impossible to cover all cases. Generally, things like part numbers should be in fields like volume
or number
. Another way around this specific example is to use double-braces: {{I}}
. This edge-case only occurs with single glyphs in braces and doesn't affect, for example {II}
.
I know this is a pain, but is there absolutely no chance this might work?
I think the rule here is that brace stripping should only happen if the braced contents start with a backslash, or more precisely with a macro encoding the macro version of a non-ASCII char.
I can write up some more systematic tests if you like.
I am not sure what your interpretation of "constant" is. The problem was not present in previous versions (let's say a few days ago?) and it breaks compatibility with bibtex. I have several books, journals and papers in my bibtex-file that have single capital letters in their respective titles, either due numbering or units e.g. "T" for Tesla. I guess removing brackets only in the case of following backslash should be save (since titlecase does not matter anyway here).
Ah, one other case in which we probably want braces removed is when an empty group follows a macro that Biber encoded into UTF-8 character (it appears possible to add code to capture the empty group to the code that searches these macros and replaces them with UTF-8 chars)
The problem is that fixing one edge case breaks others fairly reliably in this area because we are doing this with (very complex) regexps as there is no other way short of having a full TeX parser in there. It looks simple to differentiate between \'{I}
and {I}
but since we have no semantics here, there is no way of knowing if a preceding macro takes any arguments or how many arguments it takes. This is always going to be a hack - so far it has been a reasonably good one but the edge cases mount and the new capitalisation code meant we had to shift the problem elsewhere. I don't like it either and will have to think about it more.
I'd have thought that it might be possible to get the few cases right that we need to get right. If we ignore initial generation, which is problematic with macros anyway, for the moment, we don't need brace stripping for all macros, we only need it for diacritic-macros like \'
. Biber already knows that \'
does something to the next char and can already ignore the braces around it.
I'll have a look - it's just a little unsatisfactory having to break out into special cases when I've tried to keep this module as general as possible but I don't think there is much choice even though it makes things harder to maintain ...
Here are a few test cases I compiled. I will probably add to this if I find more, but I wanted to post it now in case I forgot about it.
\documentclass[british]{article}
\usepackage[utf8]{inputenc}
\usepackage{babel}
\usepackage{csquotes}
\usepackage[style=authoryear, backend=biber]{biblatex}
\renewbibmacro*{finentry}{%
\setunit{\newline}\printfield{verba}%
\setunit{\newline}\printfield{verbb}%
\finentry}
\begin{filecontents}[force]{\jobname.bib}
@book{appleby,
author = {Humphrey Appleby},
title = {Harmless uses {I} {II} {Humphrey Appleby} {H}umphrey {A}ppleby},
verba = {Raw: {I} {II} {Humphrey Appleby} {H}umphrey {A}ppleby},
verbb = {Expected: {I} {II} {Humphrey Appleby} {H}umphrey {A}ppleby},
date = {1981},
}
@book{bppleby:b,
author = {Humphrey Bppleby},
title = {Single letter (BibTeX style)
{\"a} {\"{o}} {\v C} {\v{Z}}},
verba = {Raw: {\"a} {\"{o}} {\v C} {\v{Z}}},
verbb = {Expected: ä ö Č Ž},
date = {1982},
}
@book{bppleby:l,
author = {Humphrey Bppleby},
title = {Single letter (LaTeX style)
\"a \"{o} \v C \v{Z}},
verba = {Raw: \"a \"{o} \v C \v{Z}},
verbb = {Expected: ä ö Č Ž},
date = {1982},
}
@book{cppleby:b,
author = {Humphrey Cppleby},
title = {Protected single letter (BibTeX style)
{{\"a}} {{\"{o}}} {{\v C}} {{\v{Z}}}},
verba = {Raw: {{\"a}} {{\"{o}}} {{\v C}} {{\v{Z}}}},
verbb = {Expected: {ä} {ö} {Č} {Ž}},
date = {1983},
}
@book{cppleby:l,
author = {Humphrey Cppleby},
title = {Protected single letter (LaTeX style)
-- doesn't exist},
date = {1983},
}
@book{dppleby:b,
author = {Humphrey Dppleby},
title = {Words (BibTeX) {\"a}s{\"a}n {\"{o}}l{\"{o}}n
{\v C}e{\v c}en {\v{Z}}e{\v{z}}en},
verba = {Raw: {\"a}s{\"a}n {\"{o}}l{\"{o}}n
{\v C}e{\v c}en {\v{Z}}e{\v{z}}en},
verbb = {Expected: äsän ölön Čečen Žežen},
date = {1984},
}
@book{dppleby:l,
author = {Humphrey Dppleby},
title = {Words (LaTeX) \"as\"an \"{o}l\"{o}n \v Ce\v cen \v{Z}e\v{z}en},
verba = {Raw: \"as\"an \"{o}l\"{o}n \v Ce\v cen \v{Z}e\v{z}en},
verbb = {Expected: äsän ölön Čečen Žežen},
date = {1984},
}
@book{eppleby:b,
author = {Humphrey Eppleby},
title = {Protected Words (BibTeX) {{\"a}s{\"a}n} {{\"{o}}l{\"{o}}n}
{{\v C}e{\v c}en} {{\v{Z}}e{\v{z}}en}},
verba = {Raw: {{\"a}s{\"a}n} {{\"{o}}l{\"{o}}n}
{{\v C}e{\v c}en} {{\v{Z}}e{\v{z}}en}},
verbb = {Expected: {äsän} {ölön} {Čečen} {Žežen}},
date = {1985},
}
@book{eppleby:l,
author = {Humphrey Eppleby},
title = {Protected Words (LaTeX) {\"as\"an} {\"{o}l\"{o}n}
{\v Ce\v cen} {\v{Z}e\v{z}en}},
verba = {Raw: {\"as\"an} {\"{o}l\"{o}n}
{\v Ce\v cen} {\v{Z}e\v{z}en}},
verbb = {Expected: {äsän} {ölön} {Čečen} {Žežen}},
date = {1985},
}
@book{fppleby,
author = {Humphrey Fppleby},
title = {Macros \emph{Hullo} $\frac{a}{b}$},
verba = {Raw: \emph{Hullo} $\frac{a}{b}$},
verbb = {Expected: \emph{Hullo} $\frac{a}{b}$},
date = {1986},
}
@book{gppleby:b,
author = {Humphrey Gppleby},
title = {Macros (BibTeX) \emph{H{\"u}llo} \emph{H{\"{e}}llo}
\emph{{\v C}e{\v c}en} \emph{{\v{Z}}e{\v{Z}}en}},
verba = {Raw: \emph{H{\"u}llo} \emph{H{\"{e}}llo}
\emph{{\v C}e{\v c}en} \emph{{\v{Z}}e{\v{Z}}en}},
verbb = {Expected: \emph{Hüllo} and \emph{Hëllo} \emph{Čečen} \emph{Žežen}},
date = {1987},
}
@book{gppleby:l,
author = {Humphrey Gppleby},
title = {Macros (LaTeX) \emph{H\"ullo} \emph{H\"{e}llo}
\emph{\v Ce\v cen} \emph{\v{Z}e\v{Z}en}},
verba = {Raw: \emph{H\"ullo} \emph{H\"{e}llo}
\emph{\v Ce\v cen} \emph{\v{Z}e\v{Z}en}},
verbb = {Expected: \emph{Hüllo} and \emph{Hëllo} \emph{Čečen} \emph{Žežen}},
date = {1987},
}
@book{hppleby:b,
author = {Humphrey Hppleby},
title = {Protected macros (BibTeX) {\emph{H{\"u}llo}}{\emph{H{\"{e}}llo}}
{\emph{{\v C}e{\v c}en}} {\emph{{\v{Z}}e{\v{Z}}en}}},
verba = {Raw: {\emph{H{\"u}llo}} {\emph{H{\"{e}}llo}}
{\emph{{\v C}e{\v c}en}} {\emph{{\v{Z}}e{\v{Z}}en}}},
verbb = {Expected: {\emph{Hüllo}} and {\emph{Hëllo}} {\emph{Čečen}} {\emph{Žežen}}},
date = {1988},
}
@book{hppleby:l,
author = {Humphrey Hppleby},
title = {Protected macros (LaTeX) {\emph{H\"ullo}} {\emph{H\"{e}llo}}
{\emph{\v Ce\v cen}} {\emph{\v{Z}e\v{Z}en}}},
verba = {Raw: {\emph{H\"ullo}} {\emph{H\"{e}llo}}
{\emph{\v Ce\v cen}} {\emph{\v{Z}e\v{Z}en}}},
verbb = {Expected: {\emph{Hüllo}} and {\emph{Hëllo}} {\emph{Čečen}} {\emph{Žežen}}},
date = {1988},
}
\end{filecontents}
\addbibresource{\jobname.bib}
\begin{document}
\nocite{*}
\raggedright
\printbibliography
\end{document}
Please try DEV now - I put a fix in which addresses single-char diacritic macros which fixes the Part {I}
example.
Thanks a lot! I checked the new binary with my rather large biblatex collection (50k lines) and there is only one (minor) issue with {IEEE} {CG}\&{A}
(short title of the IEEE Computer Graphics and Applications journal): it becomes {IEEE} {CG}\&A
. Not sure if it makes sense to hunt this particular issue down... personally, I can work around it, e.g. {IEEE} {CG\&A}
.
Hmm, that's a strange edge case indeed - there really should be a space there but that wouldn't help with something legitimate like \&\;{A}
. Please try 2.15 now - I think this should also be resolved now.
It is astonishing how many edge cases come out of the woodwork here. All my tests look great (but it is becoming more and more obvious my tests don't even scratch the surface of what people have in their .bib
files).
Just for the record A\&B
is valid TeX and differs from A\& B
in output. Space is only removed after control words consisting of (TeX) letters (characters of catcode 11) and the control space \
. Control sequences consisting of a single non-letter character do not skip the following space.
\documentclass{article}
\begin{document}
A\&B
A\& B
A\,B
A\, B
\end{document}
Please try 2.15 now - I think this should also be resolved now.
Yes, perfect! I confirm that the new version does not strip any intended bracket. By the way I am using 2.15dev for quite some time now and I am very happy with it.... Thanks!
Using biber 2.15, I'm still facing the problem of #297 with double names such as Franz{-}Josef. Unfortunately, these braces are used by default in *.bib files exported from DBLP.
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\usepackage{csquotes}
\usepackage[backend=biber]{biblatex}
\begin{filecontents}[force]{\jobname.bib}
@book{DBLP:books/daglib/0033267,
editor = {J{\"{u}}rgen Gausemeier and
Franz{-}Josef Rammig and
Wilhelm Sch{\"{a}}fer},
title = {Design Methodology for Intelligent Technical Systems, Develop Intelligent
Technical Systems of the Future},
publisher = {Springer},
year = {2014},
url = {https://doi.org/10.1007/978-3-642-45435-6},
doi = {10.1007/978-3-642-45435-6},
isbn = {978-3-642-45434-9},
timestamp = {Tue, 16 May 2017 14:01:41 +0200},
biburl = {https://dblp.org/rec/books/daglib/0033267.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
\end{filecontents}
\addbibresource{\jobname.bib}
\begin{document}
\nocite{DBLP:books/daglib/0033267}
\printbibliography
\end{document}
@gerking Please see https://github.com/plk/biber/issues/329. But really I think these braces are excessive when they are added to all entries with hyphenated given names.
@gerking Please see #329. But really I think these braces are excessive when they are added to all entries with hyphenated given names.
Thanks, and sorry for duplicating.
I know it is a painful subject, but I'd like to bring it up for hopefully one last time (see also https://github.com/plk/biber/issues/297#issuecomment-583756230). Especially since we now have
expl3
case change inbiblatex
(https://github.com/plk/biblatex/pull/1005).Biber already correctly strips the outer braces in constructs such as
so that they appear in the
.bbl
only asBut currently the additional inner/argument braces in the equivalent version
are not stripped, leaving us with
This has negative effects for case protection. With classical BibTeX both forms are not case protected, but since Biber does not strip the braces, Biber will accidentally brace protect the latter form.
Compare
with (run with
biblatex
3.15 dev for theexpl3
case changer, thelatex2e
implementation has some quirks with non-ASCII chars in pdfLaTeX; alternatively use a Unicode engine)Note how the
D
cases are protected against case change with Biber, while they are not case protected with BibTeX. If Biber were to strip the additional braces here, we would get the same result as in BibTeX (compare theC
cases).