retorquere / zotero-better-bibtex

Make Zotero effective for us LaTeX holdouts
https://retorque.re/zotero-better-bibtex/
MIT License
5.35k stars 288 forks source link

Capitalization: Capitalize all title-fields for language "en" #383

Closed retorquere closed 8 years ago

retorquere commented 9 years ago

@nickbart1980 says:

BBT should convert all titles to title-case if the ‘Language’ field is empty or starts with ‘en’, excluding, however, skip words, and strings enclosed in <span class="nocase">…<span>. ‘All titles’ means title, volume-title, container-title, collection-title, including their ‘short’ forms. Titles in entries with a non-empty ‘Language’ field that does not start with ‘en’ should be left alone (see the notes on \MakeSentenceCase, biblatex manual 4.6.4, and compare the man page of pandoc-citeproc, which has to do the inverse conversion when using a biblatex database – as would, BTW, any import of bib(la)tex into Zotero). For bibtex, which does not have a langid field and thus cannot distinguish languages, I would guess that the complete title fields of non-English titles should be wrapped in braces to prevent bibtex from messing with capitalisation.

retorquere commented 8 years ago

But wouldn't

Why Is Apple Launching a New Version of the {{iPod}}?

and

Why Is {{Apple}} Launching a New Version of the {{iPod}}?

always render to the same output? If so, I could output the sentence-cased source title Why Is Apple launching a new version of the iPod? to Why Is Apple Launching a New Version of the {{iPod}}?.

retorquere commented 8 years ago

(the reason I'm asking is that adding <span class="nocase"> seems to confuse the CSL title caser, so if I have to do it less the title caser will probably work better)

njbart commented 8 years ago

But wouldn't … and … always render to the same output?

No. Please try out the MWE.

retorquere commented 8 years ago

Ah, I see (the MWE didn't compile for me earlier, but that was a simple fix). Shame, because the title caser isn't handling everything as I would have expected when I inject these nocase spans.

retorquere commented 8 years ago

But with https://gist.github.com/retorquere/572a0f8c1d63e46670a1, both b (titlecase+preservecaps) and c (only preservecaps) render to

Doe, J. (2015b). Why is Apple launching a new version of the iPod?
Doe, J. (2015c). Why is Apple launching a new version of the iPod?

so what does titlecasing actually do?

njbart commented 8 years ago

so what does titlecasing actually do?

Ensure that bib(la)tex can do titles in title-case.

Please switch to the biblatex default style, a title-case style, by replacing

\usepackage[backend=biber, style=apa]{biblatex}
\DeclareLanguageMapping{american}{american-apa}

with

\usepackage[backend=biber]{biblatex}

and see which version gives you a title in correct title-case …

(Hint: it’s b …)

retorquere commented 8 years ago

I see. I'm assembling the last titlecaser errors; maybe citeproc-js can be made to handle them.

retorquere commented 8 years ago

With something like

Which road to rollow? The moral complexity

is that OK to output as

Which Road to Follow? {{The}} Moral Complexity
njbart commented 8 years ago

Yes, AFAICT, the first word after a ! or ? should be protected.

retorquere commented 8 years ago

Always, or when it has caps?

njbart commented 8 years ago

When it has caps, I would say.

retorquere commented 8 years ago

Sweet, that's simple the current behavior

retorquere commented 8 years ago

There's still a fair number of cases where I think the title caser doesn't do the right thing: https://bitbucket.org/fbennett/citeproc-js/issues/191/a-is-uppercased-in-the-title-caser

retorquere commented 8 years ago

I've worked around most of those by feeding the titlecaser just plain text. So we're getting close on this one.

What should be done with The City of To-morrow? The CSL title caser wants to make it The City of to-Morrow. I can enter it in my bibliography as The City of <span class="nocase">To-morrow</span> but that will always prevent downcasing by the bibliopgraphy processor, even where the style demands sentence case. Same goes for Organising/Disorganising the Breakthrough Motif; title caser makes it Organising/disorganising the Breakthrough Motif:, but protecting it with nocase is too strong. Ideas?

retorquere commented 8 years ago

Is there a list of words that biblatex expects to be lowercase in titlecase? I know "and" and "or" are supposed to keep downcased, but what about words like "after"?

njbart commented 8 years ago

… feeding the titlecaser just plain text.

I’m still puzzled why you seem to be having such difficulties with <span class="nocase"> and the citeproc-js titlecaser. I’m using <span class="nocase"> a lot in Zotero, and never encountered anything unexpected.

Still, <span class="nocase"> should work in all circumstances, and if it doesn’t, I would report it as a citeproc-js bug.

The City of To-morrow

Hmm, in Zotero, from a title [The City of To-morrow], and using “Create bibliography from item” with Chicago-author-date, I get “The City of To-Morrow” (which seems correct for a title-case style).

Organising/disorganising

Again, I think that’s a citeproc-js bug.

… list of words that biblatex expects to be lowercase in titlecase

There’s no official bib(la)tex list; bib(la)tex expects the user to enter titles in correct title case (which some styles then convert to sentence case; never the other way around).

Style manuals differ a little here, but the citeproc-js list of small words is a good approximation.

njbart commented 8 years ago

BTW, citeproc-js is currently changing some of the titlecaser’s details, and from what it looks like neither quotes nor parentheses, nor HTML-like markup will protect against case conversion from now on. See http://sourceforge.net/p/xbiblio/mailman/xbiblio-devel/thread/CAJgpGgAGORo22rX8wRoV2Gd1fmX3iuzbXzwm829sRQ-9i%3DMcmg%40mail.gmail.com/#msg34605413

retorquere commented 8 years ago

I’m still puzzled why you seem to be having such difficulties with <span class="nocase"> and the citeproc-js titlecaser. I’m using <span class="nocase"> a lot in Zotero, and never encountered anything unexpected.

Still, <span class="nocase"> should work in all circumstances, and if it doesn’t, I would report it as a citeproc-js bug.

It does, but sometimes it just dies when I feed it valid input; other times, it just doesn't title-case right; it seems the nocase is handled properly, but it sometimes appears to throw off its idea on whether it is mid-sentence, end-sentence or start-of-sentence.

Hmm, in Zotero, from a title [The City of To-morrow], and using “Create bibliography from item” with Chicago-author-date, I get “The City of To-Morrow” (which seems correct for a title-case style).

The official recommendation is however to enter titles in sentence-case, right? So that would have to be The city of to-morrow. If I enter The City of To-morrow, caps preservation will kick in to make that The {{City}} of {{To-morrow}}.

Organising/disorganising Again, I think that’s a citeproc-js bug.

OK, so I could just wait this one out.

There’s no official bib(la)tex list; bib(la)tex expects the user to enter titles in correct title case (which some styles then convert to sentence case; never the other way around).

but then what is "correct title case"? I'm going with the smallwords from the CSL titlecaser, in any case, ....

wow, that thread is active! The progression there seems promising, so I'll just wait for the results of that, but there's another reason I may want to feed only plaintext to the title caser; BBT supports <pre>...</pre> (or <script>) for raw LaTeX, and I don't want the titlecaser to make any changes in there, but I also don't want to wrap <pre> in nocase, since that would then invoke caps preservation for each pre section. What I did before is remove the pre sections and replace them with markers (\x02...\x03) so the title caser wouldn't see them, and put them back when the title caser is done; I figured if I need to work around that anyhow I might as well just not feed the titlecaser any markup. Easiest for BBT would be if citeproc-js supported such use of pre/script, but I think it's wholly specific to BBT.

retorquere commented 8 years ago

BTW the title caser doesn't deal with words in quotes consistently;

'Of' uppercased, 'meaning' not:
input:  The meaning of 'meaning'
output: The Meaning Of 'meaning'

'Example' uppercased even though it is in quotes.
input:  Test of special chars "this for example" and the end
output: Test of Special Chars "this for Example" and the End

I've added both cases to the citeproc-js issue tracker, but it looks like I can't post to the xbiblio thread you linked to.

njbart commented 8 years ago

You have to subscribe at https://lists.sourceforge.net/lists/listinfo/xbiblio-devel.

retorquere commented 8 years ago

Ah, mailing list, not forum. Looks like a lot of these issue were in fact already handled, I've pulled in the latest citeproc and things look near perfect. Tests running again.

retorquere commented 8 years ago

OK, so just 6 or so more title caser problems and this feature should be finished.

retorquere commented 8 years ago

I've released the other recent changes we concocted as part of 1.6.6; I'll release this one when the tests go green, pending changes in the citeproc titlecaser. You seem to be in the loop on this -- can you alert me when you think something has changed? I'm also watching the citeproc-js issues list.

retorquere commented 8 years ago

Activity on the citeproc title caser has been a little low lately, so I've given another one a shot; only these cases do not pass, and if I remove "that" from the shortwords list (the CSL title caser does have it, but it seems to be smart about "that is") I get this. Neither is perfect, but the first seems preferable over the existing title caser.

What do you think?

retorquere commented 8 years ago

Sorry, that should have been this for the version that doesn't have "that" in the smallWords list.

retorquere commented 8 years ago

Adding "their" to the smallwords list leaves a single failing case but one that also fails in the same way with the CSL title caser.

retorquere commented 8 years ago

I see no activity on citeproc-js currently, and the alternative titlecaser passes all my tests, so I've merged to master. Next release will have the feature.