Closed retorquere closed 8 years ago
Wait, you are saying DeclareCaseLangs is not fixed. Should I stick those languages in a preference so the translator can match?
I think that’s unnecessary. English is the only language that has case conversion. I’d be very surprised if anyone actually ever redefined \DeclareCaseLangs
.
So, on the
\emph
, what is the preference?
\emph{Homo sapiens}
or, if this it not feasible: {{\emph{Homo sapiens}}}
– NOT {\emph{Homo sapiens}}
More complex cases such as
<span class="nocase"><i>Sambucus nigra</i> subsp. <i>canadensis</i></span>
would have to be mapped to either \emph{Sambucus nigra} {subsp.} \emph{canadensis}
or {{\emph{Sambucus nigra} subsp. \emph{canadensis}}}
Or do you see any other options?
title = {From {Bell}’S {Theorem} to {Secure Quantum Key Distribution}},
… how to prevent this?
Not sure. Treat ’
as part of the word? Also, how do the citeproc-js routines for converting to title-case work (they must have some solution for this), and could BBT possibly borrow these?
I've looked into that, but they look to assume a substantial amount of internal state. I'm building a word parser from this right now.
WRT to the \emph{}
case, the reverse case is indeed more complex. It would require lookahead and that's not a level of complexity I'm looking forward to. adding a double-brace to protect would be the easiest way by far.
Argh, and then there is o'neal, where you do want to capitalize both.
Crazy.
I managed to tap into the CSL tittecaser (I think!), so we'll see how that goes. Not super enthusiastic about changing peoples' capitalization, so the title casing will definitely be behind a off-by-default preference.
It's starting to look fairly decent. Only titles are cased right now, I'd appreciate a list of fields that need this treatment (and help getting the test files updated for the new behavior). The title casing is a little fragile as the CSL titlecaser (sensibly) expects to be handed whole sentences, and I'm handing it fragments as I deal with the embedded HTML.
… I'm handing it fragments …
But since Zotero fields may contain embedded <span class="nocase">…<span>
, <i>…</i>
, <b>…</b>
, etc. anyway, I would have expected the CSL titlecaser to be able to handle this.
… a list of fields that need this treatment …
title, container-title (except in Journal Article, Magazine Article, Newspaper Article), volume-title; not collection-title.
… getting the test files updated …
I’m afraid right now I’m still too busy with testing edge cases in biblatex (see e.g. https://tex.stackexchange.com/questions/276943/biblatex-how-to-to-emphasize-but-not-caps-protect; documentation is a complete mess).
The titlecaser doesn't uppercase anything inside HTML tags. If that's OK, I'm fine with that (it would simplify things), but it doesn't seem right.
You phrase applicability in CSL-JSON terms -- that follows the mapping behavior already present? volume-title doesn't always map to the same bibtex field, and the fields it maps to can be generated by other means. Can you specify this behavior further?
I’m afraid right now I’m still too busy with testing edge cases in biblatex (see e.g. https://tex.stackexchange.com/questions/276943/biblatex-how-to-to-emphasize-but-not-caps-protect; documentation is a complete mess).
which is a much more valuable way to spend your time. Forget about the test cases.
The titlecaser doesn't uppercase anything inside HTML tags.
Well, it seems not to uppercase the first word; see bug report at https://bitbucket.org/fbennett/citeproc-js/issues/187/title-case-formatter-does-not-title-case
I bet that's more breakage from it starting a fresh state after an HTML tag. I work around that with some success.
Can you specify this behavior further?
In biblatex terms: title, shorttitle, origtitle, booktitle, maintitle; not journaltitle, series, and eventtitle.
Since BBT currently does not output any subtitles, titleaddons, reprinttitle, issuetitle or indextitle, none of these are relevant.
Ah sweet, that makes things a lot clearer.
Done, tests are running.
When we're done, Zotero is going to have the best damn BibTeX support short of JabRef. And that includes the commercial offerings Zotero is usually compared against.
You bet!
Should journaltitle be caps-preserved? Or do caps preservation and titlecasing always and exclusively co-occur on the same set of fields?
Should journaltitle be caps-preserved?
No. Traditionally, journal titles are in title case and never change – and, more important for BBT, styles don’t try to change them.
Or do caps preservation and titlecasing always and exclusively co-occur on the same set of fields?
Yes.
OK, so I can just collapse those two behaviors and exclusively apply them to itle, shorttitle, origtitle, booktitle, maintitle; no more, no less.
(this is important to know for sure as I'm about to dive in and start adjusting test cases)
None of the CSL or biblatex styles I ever came across fiddles with the case of journal names or series – so, yes.
(If you don’t want to just take my word on this – adamsmith, Oct 2015: “I've never seen sentence cased journal titles in any citation …”, here.)
Note to self: don't break bibvar preservation
@nickbart1980 So we're just going with always double-brace for nocase?
I’m afraid so – not pretty, but apparently the only format that works regardless of what’s inside the braces.
Alternative: use double-braces only if ‘argument’ starts with \
.
Working on adjusting the tests.
Does this mean BTW that {\emph{...}}
should be preferred over \emph{...}
for <i>...</i>
?
Good idea – counterintuitive, but I guess this should work for emphasized but non-caps-protected strings.
See for yourself:
\documentclass[american]{article}
\usepackage[american]{babel}
\usepackage[autostyle]{csquotes}%
\usepackage[backend=biber, style=apa]{biblatex}
\DeclareLanguageMapping{american}{american-apa}
\usepackage{fontspec}
\setmainfont{Linux Libertine}
\usepackage{filecontents}
\begin{filecontents}{\jobname.bib}
@article{a,
author = {Doe, John},
year = {2015},
title = "Any {Foo} that appears uppercase is protected –
\emph{Foo}, {\emph{Foo}}, {{\emph{Foo}}}"
}
\end{filecontents}
%
\addbibresource{\jobname.bib}
\begin{document}
\cite{a}
\printbibliography
\end{document}
Output:
Doe, J. (2015). Any Foo that appears uppercase is protected – Foo, foo, Foo.
wait, {\emph{Foo}}
doesn't exactly do what I had expected here. Why would we want that form?
Ah, got it -- we want <i>...</i>
not to trigger protection, so {\emph{...}}
is the right behavior. Does this also go for textbf, textsuperscript, textsubscript, enquote and textsc?
wait,
{\emph{Foo}}
doesn't exactly do what I had expected here. Why would we want that form?
For emphasized but non-caps-protected strings, i.e., <i>...</i>
.
See my enquiry at https://tex.stackexchange.com/a/277170/22851 whether this is indeed a general solution.
Does this also go for textbf, textsuperscript, textsubscript, enquote and textsc?
\textbf
: I tested this, and: yes. – Will try the others …
Ok, \textbf
, \textsuperscript
, \textsubscript
, \enquote
and \textsc
all show the same behaviour. See for yourself: Try any of these commands in my MWE above.
BTW: The citeproc-js issue “Title case formatter does not title-case first word inside markup” (here) has been resolved now, so possibly the workaround you mentioned earlier is no longer required.
Thanks for the heads-up -- workaround removed.
(mental note -- un-chunk titlecaser)
Got a response from a biblatex developer: {\emph{Foo}}
being parsed as emphasized but non-caps-protected string, and \emph{Foo}
and {{\emph{Foo}}}
as emphasized and caps-protected strings apparently is the expected behavior:
“This is just one of the small differences between bibtex the program and the btparse library used by biber.” (https://github.com/plk/biblatex/issues/357#issuecomment-154819866)
alright, then the current approach is the proper one.
FWIW, the btparse
library biber
/biblatex
are using is described in
Ward, Gregory P. 1998. “btOOL: An Object-Oriented Library for Processing BibTeX-Style Text Databases.” Montreal: McGill University, School of Computer Science. https://gerg.ca/software/btOOL/btOOL.ps.gz.
In particular, see p. 58–9 on bt_change_case()
: “The right solution (and this applies to any title with a TeX command that becomes actual text) is to bury the control sequence at brace-depth two:
A Guide to {{\LaTeXe}}: Document Preparation ...
”
Cool. The current converter only does exactly this - at deeper levels it knows they've already been applied and doesn't do it again.
The main hurdle right now is title casing. If I can get that right, I think it's good to go.
Damn... most of it is working now (barring a final issue in the title caser) but it is slow. Like 250% slower. I'll need to look where that happened, because this really isn't OK.
OK, all tests green (yay!) but performance is unacceptable. Still looking at possible hotspots, but caps preservation is a major contributor. Haven't yet tried with titleCasing on, as that currently errors out in the performance test.
(for clarity: the current slowness isn't attributable to the titlecaser, its my caps preservation)
OK, managed to whittle performance back to where it was. Still looking at the title casing -- the CSL title caser doesn't yet handle everything gracefully. Does the following sum op titlecasing correctly, assuming the source is in sentence case?
(^|\s)(<punctuation>*)
:<space>*
(each time? first time?), downcase the first letterI’m afraid I can’t give a definitive answer, partly because I’m not sure I fully understand your notation. All in all, I’d say citeproc-js’s titlecaser does a good job – if it does something unexpected, it’d be interesting to see the example.
Two things to keep in mind: (1) AFAICT, the titlecaser never actively downcases anything.
(2) The first letter of a subtitle is capitalised in some styles but not in others. citeproc-js has a heuristic to identify the subtitle, which is, AFAIR, to compare title and short title, and, if the initial part of the title matches the subtitle, assume that the rest of title, i.e., title minus short title, is the subtitle. citeproc-js then creates “virtual” variables title-main and title-sub (see http://sourceforge.net/p/xbiblio/mailman/message/32056473/). In addition to that, there is a citeproc-js processor plugin (https://juris-m.github.io/downloads/, look for “Propachi Upper”) that controls whether the first letter of a subtitle should be capitalised. BBT, I guess, could again borrow this, and output biblatex title and subtitle fields. BBT might also offer an option for capitalising the first letter of a subtitle (the Zotero/CSL recommendation is to enter subtitles starting with a lowercase letter).
The notation is sort-of-regex, in normal english:
<space>XYZ's
, <space>"XYZ's"
would consider the X
s to be word-starts, but not the s
, since the s doesn't have a preceding <space>
.:<space>
I've submitted a sample in an issue report for citeproc-js, as it currently errors out if I feed it that.
WRT (2), I can't use title matching currently since I'm not feeding it the whole reference, just the field. And I can't assume any particular style in play; I'm targeting what Bib(La)TeX wants to see, which must be before any decision is made on render style.
I'm more than happy to use the CSL titlecaser if it works (it doesn't seem like an easy problem, and I lack any expertise in the domain), but I'm abusing their API (it doesn't expect to be fed just parts of references), and it does some double work (both citeproc and BBT are traversing the HTML string). If there is an easy way to fold those two together it will likely have positive performance impact, and BBT has gotten sufficiently complex that I need to tread lightly here. The cache helps enormously, but I know for example users whose library already takes 40-60 minutes to export with a cold cache, and I'd like to at least not worsen that (an output change has traditionally triggered a cache drop -- I'm going to make that optional because of this).
In any case, Frank has been super responsive so I'll just wait this one out.
Is it desirable to caps-protect words in English titles that are already in Initialcaps? The titlecase would have changed their sentence case form to the titlecase (Initialcaps) form anyhow.
Yes, of course.
Zotero’s Why is Apple launching a new version of the iPod?
must be converted to
biblatex Why Is {Apple} Launching a New Version of the {iPod}?
(Or else biblatex sentence-case styles would render the unprotected Why Is Apple Launching a New Version of the iPod?
as Why is apple launching a new version of the ipod?
)
But would Why Is Apple Launching a New Version of the {{iPod}}?
not do the same thing?
(I thought we had settled on Why Is {{Apple}} Launching a New Version of the {{iPod}}?
?)
But would
Why Is Apple Launching a New Version of the {iPod}?
not do the same thing?
That’s how biblatex’s (and bibtex’s!) conversion from title case to sentence case works. Just try my biblatex-apa MWE above.
(I thought we had settled on
Why Is {{Apple}} Launching a New Version of the {{iPod}}?
?)
Right, that works, too, and is needed when the string inside starts with a \
, and I don’t see problems if we use this across the board.
@nickbart1980 says: