plk / biblatex

biblatex is a sophisticated bibliography system for LaTeX users. It has considerably more features than traditional bibtex and supports UTF-8
518 stars 118 forks source link

Support Auto. Title Case Formatting #1104

Open stefanbschneider opened 3 years ago

stefanbschneider commented 3 years ago

I'm using biblatex with biber and have hundreds of entries in my .bib file with inconsistent capitalization of their titles. I'd like biblatex to convert all titles into Title Case. It seems like this is not possible.

Converting to sentence case is possible with \DeclareFieldFormat{titlecase}{\MakeSentenceCase*{#1}} but also converts the conference names into sentence case. For example, IEEE Conference on XY is converted to Ieee conference on xy, which is unacceptable.

It seems like I either have to spend hours manually changing all titles to Title Case. Or, if I go for auto. sentence case, spending hours manually escaping all conference names. Is there no better way?

moewew commented 3 years ago

Essentially biblatex (the same holds for classical BibTeX) assumes that your titles in the .bib file are in title case, since they only implement functions to convert to sentence case. Apart from the TeX.SX link you found this is also discussed at length in https://tex.stackexchange.com/q/439440/35864 and https://tex.stackexchange.com/q/166616/35864 as well as linked posts.

Given how difficult string processing is in LaTeX, I think it is quite the task to implement a real "title casing" macro that deserves the name, is customisable enough and can deal with all the content users may want to throw at it. The sentence casing code is complex enough and that only needs to lowercase everything after the first letter, it does not need to decide what a word is and which words need to be capitalised. (I don't doubt that you can come up with some ad hoc solutions that work well enough in specific cases.)

I think your best bet at the moment is an external tool written in a programming language that can deal with strings more naturally than TeX and that has a Title Case function. I guess in theory it might be possible to add something like that to Biber so that it pre-processes your titles and converts them to title case, but I don't know if there is a good Perl module for title casing and there are some subtleties here.


If your only issue with converting everything to title case is conference titles, then you could try a biblatex-ext style, which gives you slightly more control over the exact fields that get sentence casing. You could for example exempt booktitles of @inproceedings entries.

\documentclass[british]{article}
\usepackage[T1]{fontenc}
\usepackage{babel}
\usepackage{csquotes}

\usepackage[backend=biber, style=ext-authoryear]{biblatex}

% applies only to 'title' field, not to booktitle
\DeclareFieldFormat{titlecase:title}{\MakeSentenceCase{#1}}

\addbibresource{biblatex-examples.bib}

\begin{document}
Lorem \autocite{sigfridsson,moraux}

\printbibliography
\end{document}
stefanbschneider commented 3 years ago

Thanks for the help!

Yes, limiting the sentence case to just the title of my bib entries would also solve my problem. All other fields are capitalized fairly consistently.

Unfortunately, my Latex crashes when setting the option to style=ext-authoryear or ext-numeric. Did I have to install and import some other package? There doesn't seem to be a biblatex-ext package and you also didn't import it.

If this doesn't work, I think I'll resort to your second tip and write a Python script to do the conversion. But the built-in sentence case option limited to just the entries' title would be easier.

moewew commented 3 years ago

For style=ext-numeric, or style=ext-authoryear, you need to install the biblatex-ext bundle (https://ctan.org/pkg/biblatex-ext, which is available in both current MikTeX and TeX Live).

If you are stuck with an old TeX Live (for example because you are using one that comes with your OS), biblatex-ext might simply not be available. In that case you could try to copy the relevant bits from ext-standard.bbx into your preamble (depending on how old your biblatex is some other code might be needed)

\documentclass[british]{article}
\usepackage[T1]{fontenc}
\usepackage{babel}
\usepackage{csquotes}

\usepackage[backend=biber, style=authoryear]{biblatex}

\providecommand*{\titleaddonpunct}{\newunitpunct}

\DeclareFieldAlias{titlecase:title}{titlecase}
\renewbibmacro*{title}{%
  \ifboolexpr{
    test {\iffieldundef{title}}
    and
    test {\iffieldundef{subtitle}}
  }
    {}
    {\printtext[title]{%
       \printfield[titlecase:title]{title}%
       \setunit{\subtitlepunct}%
       \printfield[titlecase:title]{subtitle}}%
     \setunit{\titleaddonpunct}}%
  \printfield{titleaddon}}

\DeclareFieldAlias{titlecase:booktitle}{titlecase}
\renewbibmacro*{booktitle}{%
  \ifboolexpr{
    test {\iffieldundef{booktitle}}
    and
    test {\iffieldundef{booksubtitle}}
  }
    {}
    {\printtext[booktitle]{%
       \printfield[titlecase:booktitle]{booktitle}%
       \setunit{\subtitlepunct}%
       \printfield[titlecase:booktitle]{booksubtitle}}%
     \setunit{\titleaddonpunct}}%
  \printfield{booktitleaddon}}

\DeclareFieldAlias{titlecase:maintitle}{titlecase}
\renewbibmacro*{maintitle}{%
  \ifboolexpr{
    test {\iffieldundef{maintitle}}
    and
    test {\iffieldundef{mainsubtitle}}
  }
    {}
    {\printtext[maintitle]{%
       \printfield[titlecase:maintitle]{maintitle}%
       \setunit{\subtitlepunct}%
       \printfield[titlecase:maintitle]{mainsubtitle}}%
     \setunit{\titleaddonpunct}}%
  \printfield{maintitleaddon}}

\DeclareFieldAlias{titlecase:journaltitle}{titlecase}
\renewbibmacro*{journal}{%
  \ifboolexpr{
    test {\iffieldundef{journaltitle}}
    and
    test {\iffieldundef{journalsubtitle}}
  }
    {}
    {\printtext[journaltitle]{%
       \printfield[titlecase:journaltitle]{journaltitle}%
       \setunit{\subtitlepunct}%
       \printfield[titlecase:journaltitle]{journalsubtitle}}%
     \setunit{\titleaddonpunct}}%
  \iffieldundef{journaltitleaddon}
    {}
    {\printfield{journaltitleaddon}}}

\renewbibmacro*{periodical}{%
  \ifboolexpr{
    test {\iffieldundef{title}}
    and
    test {\iffieldundef{subtitle}}
  }
    {}
    {\printtext[title]{%
       \printfield[titlecase:title]{title}%
       \setunit{\subtitlepunct}%
       \printfield[titlecase:title]{subtitle}}%
     \setunit{\titleaddonpunct}}%
  \iffieldundef{titleaddon}
    {}
    {\printfield{titleaddon}}}

\DeclareFieldAlias{titlecase:issuetitle}{titlecase}
\renewbibmacro*{issue}{%
  \ifboolexpr{
    test {\iffieldundef{issuetitle}}
    and
    test {\iffieldundef{issuesubtitle}}
  }
    {}
    {\printtext[issuetitle]{%
       \printfield[titlecase:issuetitle]{issuetitle}%
       \setunit{\subtitlepunct}%
       \printfield[titlecase:issuetitle]{issuesubtitle}}%
     \setunit{\titleaddonpunct}}%
  \printfield{issuetitleaddon}}

% applies only to 'title' field, not to booktitle
\DeclareFieldFormat{titlecase:title}{\MakeSentenceCase{#1}}

\addbibresource{biblatex-examples.bib}

\begin{document}
Lorem \autocite{sigfridsson,moraux}

\printbibliography
\end{document}
josephwright commented 3 years ago

Given some restriction on the nature of 'text', it is possible to build a wrapper around \text_titlecase:n that does this - basically, one divides up the input at spaces and then checks for special cases before applying the case-changing code. The issue always is what counts as 'text'.

plk commented 3 years ago

There are Unicode aware casing modules for Perl but not for title casing as this is not like other casings as it's partly semantic rather than purely syntactic. I doubt there is (or every will be) a fully general, unicode aware title casing function as there are too many edge cases. However, if we can formulate a set of rules, we can implement this in biber and make a titled-case version of title fields in the .bbl.

moewew commented 3 years ago

For Perl there is https://metacpan.org/pod/Lingua::EN::Titlecase. Not sure how good it is. Generally I'd expect that users would want to be able to customise the title casing functions by giving a list of exceptions.

As for the LaTeX side: How easy is it to reliably split at spaces and treat each 'word' separately? Plus we'd have to implement a list of words that should not be capitalised in Title case. How would one best compare our separate words to that list? RegExp?

josephwright commented 3 years ago

As for the LaTeX side: How easy is it to reliably split at spaces and treat each 'word' separately.

If we assume 'text' has been processed such that we have macros expanded and spaces between words, all we need is a second list to deal with 'special cases'.

Plus we'd have to implement a list of words that should not be capitalised in Title case. How would one best compare our separate words to that list? RegExp?

I was thinking a simple comma list, but we could do regex at cost of expandablity/cost.

stefanbschneider commented 3 years ago

I think simply having a list of exceptions (words that should be lower-case) is the best way to go. This is what https://titlecase.com/ seems to do. And, of course, text in curly brackets, eg, for acronyms should not be affected by the title case. Other than that, I don't see any more exceptions or border cases

josephwright commented 3 years ago

And, of course, text in curly brackets, eg, for acronyms should not be affected by the title case. Other than that, I don't see any more exceptions or border cases

I'd avoid just 'curly braces': we have a \NoChangeCase-like setup that avoids the potential ambiguity.

moewew commented 3 years ago

OK, here is a start, but I'm still skeptical that this will lead to something that is safe enough for people to use. Depending on what "title case" means to you, a simple solution with just a list of exceptions simply is not good enough.

There is also the question about what we should assume of the input. Should we assume it is all-lowercase? Should we assume that people only ever want us to manipulate the first letter of each word? Is our definition of word (split at a space) right? (What about ~ for example?)

\documentclass[british]{article}
\usepackage[T1]{fontenc}
\usepackage{babel}
\usepackage{csquotes}

\ExplSyntaxOn
\tl_put_right:Nn \l_text_expand_exclude_tl { \NoCaseChange }

\tl_put_right:Nn \l_text_case_exclude_arg_tl { \NoCaseChange }

\seq_new:N \g_biblatex_titlecase_lowerexceptions_seq

\seq_set_from_clist:Nn \g_biblatex_titlecase_lowerexceptions_seq
 { and, but, for, or, nor, the, a, an, to, as, of, in, at, by } 

\cs_generate_variant:Nn \seq_set_split:Nnn { Nnx }

\cs_new:Npn \biblatex_titlecase:n #1
  {
    \seq_set_split:Nnx \l_tmpa_seq { ~ } { \text_expand:n {#1} }
    \seq_map_indexed_function:NN \l_tmpa_seq \__biblatex_titlecase_iter:nn
  }

\cs_new:Npn \__biblatex_titlecase_iter:nn #1 #2
  {
    \int_compare:nNnTF {#1} > {1}
      {
        \seq_if_in:NxTF \g_biblatex_titlecase_lowerexceptions_seq {\text_lowercase:n {#2}}
          {~\text_lowercase:n {#2}}
          {~\text_titlecase_first:n {#2}}
      }
      {
        \text_titlecase_first:n {#2}
      }
  }

\NewDocumentCommand \testtitlecase {m} { \biblatex_titlecase:n {#1} }
\ExplSyntaxOff

\makeatletter
\ifundef\NoCaseChange
  {\let\NoCaseChange\@firstofone}
  {}
\makeatother

\newcommand\goo{goo hoo}

\begin{document}
\testtitlecase{Hello You}

\testtitlecase{The {\TeX book \goo} \goo}

\testtitlecase{The \NoCaseChange{\TeX book \goo} \goo}

\testtitlecase{The \NoCaseChange{hey} \goo}

\testtitlecase{The arrival of the queen of sheeba}

\testtitlecase{The Arrival Of The Queen Of Sheeba}

\testtitlecase{The Story of \NoCaseChange{HMS} \emph{Erebus}
in \emph{Really} Strong Wind}

\testtitlecase{The story of \NoCaseChange{HMS} \emph{Erebus}
in \emph{really} strong wind}

\testtitlecase{Argonauts of the \NoCaseChange{Western Pacific}}

\testtitlecase{Proceedings of the IEEE}
\end{document}
plk commented 3 years ago

I wonder if we should rather look and see of CLDR has something that will work for all Unicode. Hacking this syntactically is probably just too hard.

http://cldr.unicode.org/development/development-process/design-proposals/consistent-casing

moewew commented 1 year ago

There is now work on title casing in the LaTeX kernel: https://github.com/latex3/latex3/pull/1240