Open stefanbschneider opened 3 years ago
Essentially biblatex
(the same holds for classical BibTeX) assumes that your titles in the .bib
file are in title case, since they only implement functions to convert to sentence case. Apart from the TeX.SX link you found this is also discussed at length in https://tex.stackexchange.com/q/439440/35864 and https://tex.stackexchange.com/q/166616/35864 as well as linked posts.
Given how difficult string processing is in LaTeX, I think it is quite the task to implement a real "title casing" macro that deserves the name, is customisable enough and can deal with all the content users may want to throw at it. The sentence casing code is complex enough and that only needs to lowercase everything after the first letter, it does not need to decide what a word is and which words need to be capitalised. (I don't doubt that you can come up with some ad hoc solutions that work well enough in specific cases.)
I think your best bet at the moment is an external tool written in a programming language that can deal with strings more naturally than TeX and that has a Title Case function. I guess in theory it might be possible to add something like that to Biber so that it pre-processes your titles and converts them to title case, but I don't know if there is a good Perl module for title casing and there are some subtleties here.
If your only issue with converting everything to title case is conference titles, then you could try a biblatex-ext
style, which gives you slightly more control over the exact fields that get sentence casing. You could for example exempt booktitle
s of @inproceedings
entries.
\documentclass[british]{article}
\usepackage[T1]{fontenc}
\usepackage{babel}
\usepackage{csquotes}
\usepackage[backend=biber, style=ext-authoryear]{biblatex}
% applies only to 'title' field, not to booktitle
\DeclareFieldFormat{titlecase:title}{\MakeSentenceCase{#1}}
\addbibresource{biblatex-examples.bib}
\begin{document}
Lorem \autocite{sigfridsson,moraux}
\printbibliography
\end{document}
Thanks for the help!
Yes, limiting the sentence case to just the title of my bib entries would also solve my problem. All other fields are capitalized fairly consistently.
Unfortunately, my Latex crashes when setting the option to style=ext-authoryear
or ext-numeric
. Did I have to install and import some other package? There doesn't seem to be a biblatex-ext
package and you also didn't import it.
If this doesn't work, I think I'll resort to your second tip and write a Python script to do the conversion. But the built-in sentence case option limited to just the entries' title would be easier.
For style=ext-numeric,
or style=ext-authoryear,
you need to install the biblatex-ext
bundle (https://ctan.org/pkg/biblatex-ext, which is available in both current MikTeX and TeX Live).
If you are stuck with an old TeX Live (for example because you are using one that comes with your OS), biblatex-ext
might simply not be available. In that case you could try to copy the relevant bits from ext-standard.bbx
into your preamble (depending on how old your biblatex
is some other code might be needed)
\documentclass[british]{article}
\usepackage[T1]{fontenc}
\usepackage{babel}
\usepackage{csquotes}
\usepackage[backend=biber, style=authoryear]{biblatex}
\providecommand*{\titleaddonpunct}{\newunitpunct}
\DeclareFieldAlias{titlecase:title}{titlecase}
\renewbibmacro*{title}{%
\ifboolexpr{
test {\iffieldundef{title}}
and
test {\iffieldundef{subtitle}}
}
{}
{\printtext[title]{%
\printfield[titlecase:title]{title}%
\setunit{\subtitlepunct}%
\printfield[titlecase:title]{subtitle}}%
\setunit{\titleaddonpunct}}%
\printfield{titleaddon}}
\DeclareFieldAlias{titlecase:booktitle}{titlecase}
\renewbibmacro*{booktitle}{%
\ifboolexpr{
test {\iffieldundef{booktitle}}
and
test {\iffieldundef{booksubtitle}}
}
{}
{\printtext[booktitle]{%
\printfield[titlecase:booktitle]{booktitle}%
\setunit{\subtitlepunct}%
\printfield[titlecase:booktitle]{booksubtitle}}%
\setunit{\titleaddonpunct}}%
\printfield{booktitleaddon}}
\DeclareFieldAlias{titlecase:maintitle}{titlecase}
\renewbibmacro*{maintitle}{%
\ifboolexpr{
test {\iffieldundef{maintitle}}
and
test {\iffieldundef{mainsubtitle}}
}
{}
{\printtext[maintitle]{%
\printfield[titlecase:maintitle]{maintitle}%
\setunit{\subtitlepunct}%
\printfield[titlecase:maintitle]{mainsubtitle}}%
\setunit{\titleaddonpunct}}%
\printfield{maintitleaddon}}
\DeclareFieldAlias{titlecase:journaltitle}{titlecase}
\renewbibmacro*{journal}{%
\ifboolexpr{
test {\iffieldundef{journaltitle}}
and
test {\iffieldundef{journalsubtitle}}
}
{}
{\printtext[journaltitle]{%
\printfield[titlecase:journaltitle]{journaltitle}%
\setunit{\subtitlepunct}%
\printfield[titlecase:journaltitle]{journalsubtitle}}%
\setunit{\titleaddonpunct}}%
\iffieldundef{journaltitleaddon}
{}
{\printfield{journaltitleaddon}}}
\renewbibmacro*{periodical}{%
\ifboolexpr{
test {\iffieldundef{title}}
and
test {\iffieldundef{subtitle}}
}
{}
{\printtext[title]{%
\printfield[titlecase:title]{title}%
\setunit{\subtitlepunct}%
\printfield[titlecase:title]{subtitle}}%
\setunit{\titleaddonpunct}}%
\iffieldundef{titleaddon}
{}
{\printfield{titleaddon}}}
\DeclareFieldAlias{titlecase:issuetitle}{titlecase}
\renewbibmacro*{issue}{%
\ifboolexpr{
test {\iffieldundef{issuetitle}}
and
test {\iffieldundef{issuesubtitle}}
}
{}
{\printtext[issuetitle]{%
\printfield[titlecase:issuetitle]{issuetitle}%
\setunit{\subtitlepunct}%
\printfield[titlecase:issuetitle]{issuesubtitle}}%
\setunit{\titleaddonpunct}}%
\printfield{issuetitleaddon}}
% applies only to 'title' field, not to booktitle
\DeclareFieldFormat{titlecase:title}{\MakeSentenceCase{#1}}
\addbibresource{biblatex-examples.bib}
\begin{document}
Lorem \autocite{sigfridsson,moraux}
\printbibliography
\end{document}
Given some restriction on the nature of 'text', it is possible to build a wrapper around \text_titlecase:n
that does this - basically, one divides up the input at spaces and then checks for special cases before applying the case-changing code. The issue always is what counts as 'text'.
There are Unicode aware casing modules for Perl but not for title casing as this is not like other casings as it's partly semantic rather than purely syntactic. I doubt there is (or every will be) a fully general, unicode aware title casing function as there are too many edge cases. However, if we can formulate a set of rules, we can implement this in biber and make a titled-case version of title fields in the .bbl.
For Perl there is https://metacpan.org/pod/Lingua::EN::Titlecase. Not sure how good it is. Generally I'd expect that users would want to be able to customise the title casing functions by giving a list of exceptions.
As for the LaTeX side: How easy is it to reliably split at spaces and treat each 'word' separately? Plus we'd have to implement a list of words that should not be capitalised in Title case. How would one best compare our separate words to that list? RegExp?
As for the LaTeX side: How easy is it to reliably split at spaces and treat each 'word' separately.
If we assume 'text' has been processed such that we have macros expanded and spaces between words, all we need is a second list to deal with 'special cases'.
Plus we'd have to implement a list of words that should not be capitalised in Title case. How would one best compare our separate words to that list? RegExp?
I was thinking a simple comma list, but we could do regex at cost of expandablity/cost.
I think simply having a list of exceptions (words that should be lower-case) is the best way to go. This is what https://titlecase.com/ seems to do. And, of course, text in curly brackets, eg, for acronyms should not be affected by the title case. Other than that, I don't see any more exceptions or border cases
And, of course, text in curly brackets, eg, for acronyms should not be affected by the title case. Other than that, I don't see any more exceptions or border cases
I'd avoid just 'curly braces': we have a \NoChangeCase
-like setup that avoids the potential ambiguity.
OK, here is a start, but I'm still skeptical that this will lead to something that is safe enough for people to use. Depending on what "title case" means to you, a simple solution with just a list of exceptions simply is not good enough.
There is also the question about what we should assume of the input. Should we assume it is all-lowercase? Should we assume that people only ever want us to manipulate the first letter of each word? Is our definition of word (split at a space) right? (What about ~
for example?)
\documentclass[british]{article}
\usepackage[T1]{fontenc}
\usepackage{babel}
\usepackage{csquotes}
\ExplSyntaxOn
\tl_put_right:Nn \l_text_expand_exclude_tl { \NoCaseChange }
\tl_put_right:Nn \l_text_case_exclude_arg_tl { \NoCaseChange }
\seq_new:N \g_biblatex_titlecase_lowerexceptions_seq
\seq_set_from_clist:Nn \g_biblatex_titlecase_lowerexceptions_seq
{ and, but, for, or, nor, the, a, an, to, as, of, in, at, by }
\cs_generate_variant:Nn \seq_set_split:Nnn { Nnx }
\cs_new:Npn \biblatex_titlecase:n #1
{
\seq_set_split:Nnx \l_tmpa_seq { ~ } { \text_expand:n {#1} }
\seq_map_indexed_function:NN \l_tmpa_seq \__biblatex_titlecase_iter:nn
}
\cs_new:Npn \__biblatex_titlecase_iter:nn #1 #2
{
\int_compare:nNnTF {#1} > {1}
{
\seq_if_in:NxTF \g_biblatex_titlecase_lowerexceptions_seq {\text_lowercase:n {#2}}
{~\text_lowercase:n {#2}}
{~\text_titlecase_first:n {#2}}
}
{
\text_titlecase_first:n {#2}
}
}
\NewDocumentCommand \testtitlecase {m} { \biblatex_titlecase:n {#1} }
\ExplSyntaxOff
\makeatletter
\ifundef\NoCaseChange
{\let\NoCaseChange\@firstofone}
{}
\makeatother
\newcommand\goo{goo hoo}
\begin{document}
\testtitlecase{Hello You}
\testtitlecase{The {\TeX book \goo} \goo}
\testtitlecase{The \NoCaseChange{\TeX book \goo} \goo}
\testtitlecase{The \NoCaseChange{hey} \goo}
\testtitlecase{The arrival of the queen of sheeba}
\testtitlecase{The Arrival Of The Queen Of Sheeba}
\testtitlecase{The Story of \NoCaseChange{HMS} \emph{Erebus}
in \emph{Really} Strong Wind}
\testtitlecase{The story of \NoCaseChange{HMS} \emph{Erebus}
in \emph{really} strong wind}
\testtitlecase{Argonauts of the \NoCaseChange{Western Pacific}}
\testtitlecase{Proceedings of the IEEE}
\end{document}
I wonder if we should rather look and see of CLDR has something that will work for all Unicode. Hacking this syntactically is probably just too hard.
http://cldr.unicode.org/development/development-process/design-proposals/consistent-casing
There is now work on title casing in the LaTeX kernel: https://github.com/latex3/latex3/pull/1240
I'm using biblatex with biber and have hundreds of entries in my
.bib
file with inconsistent capitalization of their titles. I'd like biblatex to convert all titles into Title Case. It seems like this is not possible.Converting to sentence case is possible with
\DeclareFieldFormat{titlecase}{\MakeSentenceCase*{#1}}
but also converts the conference names into sentence case. For example,IEEE Conference on XY
is converted toIeee conference on xy
, which is unacceptable.It seems like I either have to spend hours manually changing all titles to Title Case. Or, if I go for auto. sentence case, spending hours manually escaping all conference names. Is there no better way?