retorquere / zotero-better-bibtex

Make Zotero effective for us LaTeX holdouts
https://retorque.re/zotero-better-bibtex/
MIT License
5.34k stars 287 forks source link

italics in title - capitalization #541

Closed kallisons closed 8 years ago

kallisons commented 8 years ago

The title in Zotero standalone application: Effects of open- and closed-system temperature changes on blood O2-binding characteristics of Atlantic bluefin tuna (Thunnus thynnus)

I exported the collection from Zotero using Better Bibtex.

The final result in the bibliography: Brill, R. W. and Bushnell, P. G. (2006). Effects of open- and closed-system temperature changes on blood O2-binding characteristics of Atlantic bluefin tuna (thunnus thynnus). Fish Physiology and Biochemistry, 32(4):283–294.


The "O" in O2 is capitalized. The "A" in Atlantic is capitalized. The "T" in Thunnus is NOT capitalized.


I think that a problem develops when the ... is converted to \emph{}.
I looked at the entry in BibDesk: Effects of open- and closed-system temperature changes on blood {{O}}{\textsubscript{2}}-binding characteristics of {{Atlantic}} bluefin tuna ({\emph{Thunnus thynnus}})

I added a second set of {} brackets in BibDesk: {{\emph{Thunnus thynnus}}} instead of {\emph{Thunnus thynnus}}

and the bibliography printed with the correct capitalization...

Therefore in the Better Bibtex conversion, 2 sets of brackets are needed around the \emph{} to get the correct format in the .bbl file and ultimately the bibliography...

I hope this can be fixed!

Thanks, Allison

njbart commented 8 years ago

There are subtle differences between bibtex and biblatex (see https://github.com/retorquere/zotero-better-bibtex/issues/383, search for “This is just one of the small differences between bibtex the program and the btparse library used by biber”).

Could you provide details on whether you’re using bibtex or biblatex, or, better, an MWE?

And do you see the unexpected behaviour when exporting with Better Biblatex?

kallisons commented 8 years ago

What I am doing: Zotero-->Export Collection with Better BibTeX-->compile with BibTeX in TeXShop using the natbib package (\bibliographystyle{apalike})


I tried exporting with Better BibLaTeX but that created many other types errors when I compiled using in BibTeX (+natbib) in TeXShop because I use natbib. I am going to submit this paper to a journal and I need to use natbib.


In the export collection with Better BibTeX, everything works great except for the capitalization for words in the title that are in between .... I've been manually fixing the problem but the process is a bit repetitive.


I would include a MWE except I am not really sure what to attach because there are a bunch of GUIs and a plugin. If you can give me a few more details about what you want in a MWE, I can try...

retorquere commented 8 years ago

(thanks @nickbart1980 for jumping in; this aspect you know better than me, and I'm away with my family for a few days)

I think he means it'd be helpful if you could get us a compilable example that exhibits the issue - feel free to edit https://www.sharelatex.com/project/54feca38f58d781e0c982eeb .

It'd also be very helpful to have a copy of the reference - you can either right-click the reference and submit an error report, but then @nickbart1980 won't be able to access it. The other way is to export the reference as "BetterBibTeX JSON" and attach the resulting file as a text file here.

njbart commented 8 years ago

I get the same result, i.e., {{Atlantic}} bluefin tuna ({\emph{Thunnus thynnus}}), from both Better Bibtex and Better Biblatex export, and a lowercase “thunnus” in the pdf in both cases. Based on the discussion at https://github.com/retorquere/zotero-better-bibtex/issues/383, I would have expected {{Atlantic}} bluefin tuna ({{\emph{Thunnus thynnus}}}) – at least for Biblatex, but right now it would seem Bibtex (which was never that much in the focus of my attention) needs that, too.

retorquere commented 8 years ago

Can either of you right-click that reference and submit an error report? That will get me the reference as a test case.

retorquere commented 8 years ago

Wow, that was trickier than I thought. I ended up rewriting the markup parser -- BBT is better off for it, but quite an effort.

I currently get what I think is the right output if I generate

{Hello and \emph{{{Thunnus}} thynnus} instead of \emph{{{Thunnus}} thynnus}}

Is that OK? It's easy to generate with the new parser, and the proposed

{Hello and {{\emph{Thunnus thynnus}}} instead of {{\emph{Thunnus thynnus}}}}

would be a) a lot harder, and b) not necessarily correct if I remember our discussions -- shouldn't the thynnus part be available to the biblatex sentence caser?

BTW, for English titles, this would become

{Hello and \emph{{{Thunnus}} Thynnus} instead of \emph{{{Thunnus}} Thynnus}}

as the case preserver wouldn't protect the lowercase word. This is the expected behavior, correct?

njbart commented 8 years ago

… shouldn't the thynnus part be available to the biblatex sentence caser?

Absolutely right.

What’s more, I just realised that the original example, though it can be expected to work with a sentence-case style (APA, …), will fail with a title-case style (Chicago, …), since it does not follow best current practice:

Species names (also units such as “nm”, “kg”; in fact any stuff that should never undergo case changes, no matter what the style wants in general) need to be protected against case changes by enclosing them in <span class="nocase"></span> – e.g., <span class="nocase"><i>Thunnus thynnus</i></span>, or <i><span class="nocase">Thunnus thynnus</span></i>.

<i>Thunnus <span class="nocase">thynnus</span></i>, though possibly even uglier, should of course work, too.

And when protected, the thynnus part should of course not be available to the biblatex sentence caser.

retorquere commented 8 years ago

@nickbart1980 to protect something like thynnus, it would have to be added to the titleCaseLowerCase. A full list of words would be undoable, but at least the user can add words themselves.

Help me remember: titlecasing was only for english, but was case preservation (the {{...}} stuff) for all languages, or only english? All languages, right?

njbart commented 8 years ago

… to protect something like thynnus, it would have to be added to the titleCaseLowerCase. A full list of words would be undoable, but at least the user can add words themselves.

That’s not what I meant. Users themselves need to protect unusual strings such as species names by wrapping them in <span class="nocase"></span>, so the Zotero title field would have to contain, e.g.,

Effects of open- and closed-system temperature changes on blood O<sub>2</sub>-binding characteristics of Atlantic bluefin tuna (<span class="nocase"><i>Thunnus thynnus</i></span>)

Even with a plain Zotero/Word-or-LO workflow this is needed, or else styles such as Chicago would render Thunnus Thynnus. BBT should simply propagate this protection.

… case preservation (the {{...}} stuff) …

Languages other than English don’t have case conversion, and so don’t need case preservation either.

retorquere commented 8 years ago

To recap, and I know I'm being repetitive here, but I need to get this right:

There are two ways in which I fiddle with the text:

  1. Title casing, which changes How to derive Ought from Is into How to Derive Ought from Is, and
  2. Case preservation, which changes How to derive Ought from Is into How to derive {{Ought}} from {{Is}}

or a combination of the two. So the correct behavior is:

  1. If the reference is not English, do neither
  2. If the reference is English, for a specific set of fields always apply both

or is it

  1. If the reference is not English, do neither
  2. If the reference is English, for a specific set of fields (like title, shorttitle) always apply both, and for another specific set of fields (like journaltitle, type), apply only case preservation

I'm asking because the current implementation is closer to the latter than the former.

retorquere commented 8 years ago

Correction: the current behavior is more like:

  1. If the reference is English, for a specific set of fields (like title, shorttitle) always apply both, and for another specific set of fields (like journaltitle, type), apply only case preservation
  2. If the reference is another language, only apply case preservation to title, shorttitle, journaltitle, type...)
retorquere commented 8 years ago

Should words that start after a : or . be treated as if they appeared at the start of a sentence?

retorquere commented 8 years ago

435 indicates at least for type I should always wrap words with uppercase, regardless of whether they appear at a sentence-start. Are there more fields like this? Or should I only not wrap leading words if they're strictly <Upper><lower>*?

retorquere commented 8 years ago

OK, I think I have most issues ironed out now. There's a few things the bibtex generator does differently than previously; they're all correctable, but I want to know whether this is acceptable output. First line is input language + input string, second line is bibtex, casePreserved and titleCased as per earlier rule (but feel free to correct):

<none>  : <<The largest U.S. companies would owe $620 billion in U.S. taxes on the cash they store in tax havens, the equivalent of our defense budget. [Tweet]>>
bibtex  : <<The Largest {{U.S}}. Companies Would Owe \$620 Billion in {{U.S}}. Taxes on the Cash They Store in Tax Havens, the Equivalent of Our Defense Budget. [{{Tweet}}]>>

<none>  : <<<i><span class="nocase">Nodo unitatis et caritatis</span></i>: The Structure and Argument of Augustine's <i><span class="nocase">De doctrina Christiana</span></i>>>
bibtex  : <<\emph{{{Nodo unitatis et caritatis}}}: The {{Structure}} and {{Argument}} of {{Augustine}}'s \emph{{{De doctrina Christiana}}}>>

fr      : <<La démocratie. Sa nature, sa valeur>>
bibtex  : <<La d{\'e}mocratie. {{Sa}} nature, sa valeur>>

<none>  : <<Social Capital Predicts Happiness: World-Wide Evidence From Time Series>>
bibtex  : <<Social {{Capital Predicts Happiness}}: {{World-Wide Evidence From Time Series}}>>

<none>  : <<Lieb-Robinson bounds, Arveson spectrum and Haag-Ruelle scattering theory for gapped quantum spin systems>>
bibtex  : <<{{Lieb-Robinson}} Bounds, {{Arveson}} Spectrum and {{Haag-Ruelle}} Scattering Theory for Gapped Quantum Spin Systems>>

<none>  : <<<i>Salmonella</i> in Pork (SALINPORK): Pre-harvest and Harvest Control Options Based on Epidemiologic, Diagnostic and Economic Research: Final Report>>
bibtex  : <<\emph{Salmonella} in {{Pork}} ({{SALINPORK}}): {{Pre-harvest}} and {{Harvest Control Options Based}} on {{Epidemiologic}}, {{Diagnostic}} and {{Economic Research}}: Final {{Report}}>>

<none>  : <<Automated Defect Prevention : Best Practices in Software Management>>
bibtex  : <<Automated {{Defect Prevention}} : Best {{Practices}} in {{Software Management}}>>

en      : <<(Liquid+liquid) equilibrium of {water+phenol+(1-butanol, or 2-butanol, or tert-butanol)} systems>>
bibtex  : <<({{Liquid}}+liquid) Equilibrium of \{water+phenol+(1-Butanol, or 2-Butanol, or Tert-Butanol)\} Systems>>

<none>  : <<The physical: violent volcanology of the 1600 eruption of Huaynaputina, southern Peru>>
bibtex  : <<The Physical: Violent Volcanology of the 1600 Eruption of {{Huaynaputina}}, Southern {{Peru}}>>

<none>  : <<Technical Report : Towards a Formally Verified Proof Assistant>>
bibtex  : <<Technical {{Report}} : Towards a {{Formally Verified Proof Assistant}}>>

<none>  : <<Full-text databse>>
bibtex  : <<{{Full-text}} Databse>>

<none>  : <<High Performance Computing (HiPC), 2011 18th international conference on>>
bibtex  : <<High {{Performance Computing}} ({{HiPC}}), 2011 18th International Conference on>>

<none>  : <<High-speed jet flows over spillway aerators>>
bibtex  : <<{{High-speed}} Jet Flows over Spillway Aerators>>

<none>  : <<Replicate Zotero key algorithm · Issue #439 · retorquere/zotero-better-bibtex>>
bibtex  : <<Replicate {{Zotero}} Key Algorithm $\cdot$ {{Issue}} \#439 $\cdot$ Retorquere/Zotero-Better-Bibtex>>

<none>  : <<11-Oxygenated Steroids. XIII. Synthesis and proof of structure of <span class="nocase">Δ1,4-Pregnadiene-17α,21-diol-3,11,20-trione and Δ1,4-Pregnadiene-11β,17α,21-triol-3,20-dione</span>>>
bibtex  : <<{{11-Oxygenated Steroids}}. {{XIII}}. Synthesis and Proof of Structure of {{$\Delta$1,4-Pregnadiene-17$\alpha$,21-diol-3,11,20-trione and $\Delta$1,4-Pregnadiene-11$\beta$,17$\alpha$,21-triol-3,20-dione}}>>

<none>  : <<Computational Models of Non-cooperative dialogue>>
bibtex  : <<Computational {{Models}} of {{Non-cooperative}} Dialogue>>

<none>  : <<Dr. Strangelove or: how I learned to stop worrying and love the bomb>>
bibtex  : <<Dr. Strangelove or: How {{I}} Learned to Stop Worrying and Love the Bomb>>

<none>  : <<The Multiobjective Traveling Salesman Problem: A Survey and a New Approach>>
bibtex  : <<The {{Multiobjective Traveling Salesman Problem}}: A {{Survey}} and a {{New Approach}}>>

<none>  : <<Classical signature of quantum annealing>>
bibtex  : <<Classical Signature of Quantum Annealing>>

<none>  : <<Defining and detecting quantum speedup>>
bibtex  : <<Defining and Detecting Quantum Speedup>>

de      : <<Sozialpolitik und Sozialstaat: Soziologische Analysen>>
bibtex  : <<Sozialpolitik und {{Sozialstaat}}: Soziologische {{Analysen}}>>

<none>  : <<Effect of immobilization on catalytic characteristics of saturated Pd-N-heterocyclic carbenes in Mizoroki-Heck reactions>>
bibtex  : <<Effect of Immobilization on Catalytic Characteristics of Saturated {{Pd-N-heterocyclic}} Carbenes in {{Mizoroki-Heck}} Reactions>>

<none>  : <<High-speed Digital-to-RF converter>>
bibtex  : <<{{High-speed Digital-to-RF}} Converter>>

<none>  : <<Some remarks on <span class="nocase">’t Hooft’s</span> S-matrix for black holes>>
bibtex  : <<Some Remarks on {{'t Hooft's}} {{S-matrix}} for Black Holes>>

por     : <<In memoriam, na cidade>>
bibtex  : <<In memoriam, na cidade>>

<none>  : <<Norm and Action. A Logical Enquiry>>
bibtex  : <<Norm and {{Action}}. A {{Logical Enquiry}}>>

<none>  : <<The physical volcanology of the 1600 eruption of Huaynaputina, southern Peru>>
bibtex  : <<The Physical Volcanology of the 1600 Eruption of {{Huaynaputina}}, Southern {{Peru}}>>

fr      : <<Critique d'une métanotion fonctionnelle. La notion (trop) fonctionnelle de « notion fonctionnelle »>>
bibtex  : <<Critique d'une m{\'e}tanotion fonctionnelle. {{La}} notion (trop) fonctionnelle de \enquote{notion fonctionnelle}>>
njbart commented 8 years ago

Sorry for the delay, but I’m willing to get to the bottom of this; might take a while though. In any case, your list of examples will be very helpful.

A few things I spotted straight away:

In general, if you want to try things for yourself, the most relevant tests are

  1. Compare BBT’s output with the output of Zotero/citeproc-js in a title-case style (most easily by right-clicking on a Zotero item, choosing “Create Bibliography from Item …”, then choosing one of the Chicago styles / “Bibliography” / “Copy to Clipboard”, and finally pasting into a word processor to see the result) – case should be identical.
  2. Compare the output of (a.) biblatex-apa using BBT’s biblatex output, and (b.) apacite using BBT’s bibtex output with the original title in Zotero – again, case should be identical.
retorquere commented 8 years ago

Thanks for your comments. In order:

  1. OK, so titlecase + preserve for english (or no language specified, assumed english), or no changes + make sure a langid is present (this is the current behavior in the branch I'm working on)
  2. csquotes is only enabled if you set a hidden pref that specifies which character pairs you want to replace -- if the hidden pref is non-empty, the assumption is you will have loaded csquotes. The pref is empty by default.
  3. Gotcha. Done.
  4. Sure, you can change them however you like, and I'll regenerate. The strings are here

As to testing myself

  1. yes please
  2. how do I do this for Zotero/citeproc-js
  3. can you set up the MWE for the biblatex case? I'll add the references later.
njbart commented 8 years ago

testing / 2.: see edit above

retorquere commented 8 years ago

BTW, why the simpler {{<everything>}} format for BibTeX? I mean, if BibTeX applies case folding, wouldn't you want to have that possible, and if it doesn't, why is the protection necessary? Would it have to be {{{<everything>}}} for similar reasons we do it for biblatex?

retorquere commented 8 years ago

Current state (these are all tests that are currently failing. Most of these will be trivially correct, and on your say-so, I'll just mark them in the test set. I'll get started on the Zotero/Citeproc verification.

test/fixtures/export/Better BibLaTeX.007.json
fr-FR   : [Test of markupconversion: Italics, bold, superscript, subscript, and small caps: Mitochondrial DNA<sub>2</sub> sequences suggest unexpected phylogenetic position of Corso-Sardinian grass snakes (<i>Natrix cetti</i>) and <b>do not</b> support their <span style="small-caps">species status</span>, with notes on phylogeography and subspecies delineation of grass snakes.]
biblatex: {Test of markupconversion: Italics, bold, superscript, subscript, and small caps: Mitochondrial DNA\textsubscript{2} sequences suggest unexpected phylogenetic position of Corso-Sardinian grass snakes (\emph{Natrix cetti}) and \textbf{do not} support their \textsc{species status}, with notes on phylogeography and subspecies delineation of grass snakes.}

test/fixtures/export/Dollar sign in title not properly escaped #485.json
<none>  : [The largest U.S. companies would owe $620 billion in U.S. taxes on the cash they store in tax havens, the equivalent of our defense budget. [Tweet]]
biblatex: {The Largest {{U.S}}. Companies Would Owe \$620 Billion in {{U.S}}. Taxes on the Cash They Store in Tax Havens, the Equivalent of Our Defense Budget. [{{Tweet}}]}

test/fixtures/export/map csl-json variables #293.json
dan     : [En ny sociologi for et nyt samfund. Introduktion til Aktør-Netværk-Teori]
biblatex: {En ny sociologi for et nyt samfund. Introduktion til Aktør-Netværk-Teori}

test/fixtures/export/map csl-json variables #293.json
fr-FR   : [La démocratie. Sa nature, sa valeur]
biblatex: {La démocratie. Sa nature, sa valeur}

test/fixtures/export/Export Forthcoming as Forthcoming.json
<none>  : [Social Capital Predicts Happiness: World-Wide Evidence From Time Series]
biblatex: {Social {{Capital Predicts Happiness}}: {{World-Wide Evidence From Time Series}}}

test/fixtures/export/biblatex; Language tag xx is exported, xx-XX is not #380.json
fr-FR   : [Le poêle de Descartes]
biblatex: {Le poêle de Descartes}

test/fixtures/export/Normalize date ranges in citekeys #356.json
fr-FR   : [Œuvres de Descartes]
biblatex: {Œuvres de Descartes}

test/fixtures/export/markup small-caps, superscript, italics #301.json
fr-FR   : [Les interventions <i>éclairées</i> devant la Cour européenne des droits de l'homme ou le rôle stratégique des <i>amici curiae</i>]
biblatex: {Les interventions \emph{éclairées} devant la Cour européenne des droits de l'homme ou le rôle stratégique des \emph{amici curiae}}

test/fixtures/export/markup small-caps, superscript, italics #301.json
fr-FR   : [Opinion et conseil dans la doctrine juridique savante (<sc>xii</sc><sup>e</sup>-<sc>xiv</sc><sup>e</sup> siècles)]
biblatex: {Opinion et conseil dans la doctrine juridique savante (\textsc{xii}\textsuperscript{e}-\textsc{xiv}\textsuperscript{e} siècles)}

test/fixtures/export/don't escape entry key fields for #296.json
fr-FR   : [Les actes de l’Administration [1949-1950]]
biblatex: {Les actes de l’Administration [1949-1950]}

test/fixtures/export/bookSection is always converted to @inbook, never @incollection #282.json
fr-FR   : [Problèmes d’organisation de l’Administration [1966-1967]]
biblatex: {Problèmes d’organisation de l’Administration [1966-1967]}

test/fixtures/export/Better BibTeX does not use biblatex fields eprint and eprinttype #170.json
<none>  : [Lieb-Robinson bounds, Arveson spectrum and Haag-Ruelle scattering theory for gapped quantum spin systems]
biblatex: {{{Lieb-Robinson}} Bounds, {{Arveson}} Spectrum and {{Haag-Ruelle}} Scattering Theory for Gapped Quantum Spin Systems}

test/fixtures/export/Capitalisation in techreport titles #160.json
<none>  : [<i>Salmonella</i> in Pork (SALINPORK): Pre-harvest and Harvest Control Options Based on Epidemiologic, Diagnostic and Economic Research: Final Report]
biblatex: {\emph{Salmonella} in {{Pork}} ({{SALINPORK}}): {{Pre-harvest}} and {{Harvest Control Options Based}} on {{Epidemiologic}}, {{Diagnostic}} and {{Economic Research}}: {{Final Report}}}

test/fixtures/export/German Umlaut separated by brackets #146.json
German  : [Planung öffentlicher Elektrizitätsverteilungs-Systeme]
biblatex: {Planung öffentlicher Elektrizitätsverteilungs-Systeme}

test/fixtures/export/Better BibLaTeX.021.json
en      : [(Liquid+liquid) equilibrium of {water+phenol+(1-butanol, or 2-butanol, or tert-butanol)} systems]
biblatex: {({{Liquid}}+liquid) Equilibrium of \{water+phenol+(1-Butanol, or 2-Butanol, or Tert-Butanol)\} Systems}

test/fixtures/export/Better BibLaTeX.016.json
<none>  : [The physical: violent volcanology of the 1600 eruption of Huaynaputina, southern Peru]
biblatex: {The Physical: Violent Volcanology of the 1600 Eruption of {{Huaynaputina}}, Southern {{Peru}}}

test/fixtures/export/autoexport.json
<none>  : [Comparing archival policies for Blue Waters]
biblatex: {Comparing Archival Policies for {{Blue Waters}}}

test/fixtures/export/autoexport.json
<none>  : [Application centric energy-efficiency study of distributed multi-core and hybrid CPU-GPU systems]
biblatex: {Application Centric Energy-Efficiency Study of Distributed Multi-Core and Hybrid {{CPU-GPU}} Systems}

test/fixtures/export/autoexport.json
<none>  : [An overview of CMIP5 and the Experiment Design]
biblatex: {An Overview of {{CMIP5}} and the {{Experiment Design}}}

test/fixtures/export/thesis zotero entries always create  bibtex entries #307.json
nob     : [CTR Det multiple arkæologiske objekt. Et studie af materialitet og arkæologiske tekstiler]
biblatex: {CTR Det multiple arkæologiske objekt. Et studie af materialitet og arkæologiske tekstiler}

test/fixtures/export/Export of creator-type fields from embedded CSL variables #365.json
<none>  : [A film]
biblatex: {A Film}

test/fixtures/export/Export of creator-type fields from embedded CSL variables #365.json
<none>  : [A report]
biblatex: {A Report}

test/fixtures/export/Export of creator-type fields from embedded CSL variables #365.json
<none>  : [Dr. Strangelove or: how I learned to stop worrying and love the bomb]
biblatex: {Dr. {{Strangelove}} or: How {{I}} Learned to Stop Worrying and Love the Bomb}

test/fixtures/export/Export of creator-type fields from embedded CSL variables #365.json
en      : [The one with the Princess Leia fantasy]
biblatex: {The One with the {{Princess Leia}} Fantasy}

test/fixtures/export/arXiv identifiers in BibLaTeX export #460.json
<none>  : [BV Master Action for Heterotic and Type II String Field Theories]
biblatex: {{{BV Master Action}} for {{Heterotic}} and {{Type II String Field Theories}}}

test/fixtures/export/arXiv identifiers in BibLaTeX export #460.json
<none>  : [Classical signature of quantum annealing]
biblatex: {Classical Signature of Quantum Annealing}

test/fixtures/export/arXiv identifiers in BibLaTeX export #460.json
<none>  : [Defining and detecting quantum speedup]
biblatex: {Defining and Detecting Quantum Speedup}

test/fixtures/export/Ignoring upper cases in German titles #456.json
de      : [Sozialpolitik und Bevölkerungsprozeß]
biblatex: {Sozialpolitik und Bevölkerungsprozeß}

test/fixtures/export/Ignoring upper cases in German titles #456.json
de      : [Schwindet die integrative Funktion des Sozialstaates?]
biblatex: {Schwindet die integrative Funktion des Sozialstaates?}

test/fixtures/export/Capitalize all title-fields for language en #383.json
en      : [A carbocyclic carbene as an efficient catalyst ligand for C–C coupling reactions]
biblatex: {A Carbocyclic Carbene as an Efficient Catalyst Ligand for {{C–C}} Coupling Reactions}

test/fixtures/export/Capitalize all title-fields for language en #383.json
en      : [Alkanethiolate gold cluster molecules with core diameters from 1.5 to 5.2 <span class="nocase">nm</span>]
biblatex: {Alkanethiolate Gold Cluster Molecules with Core Diameters from 1.5 to 5.2~{{nm}}}

test/fixtures/export/Capitalize all title-fields for language en #383.json
en      : [A stochastic model of TCP Reno congestion avoidance and control]
biblatex: {A Stochastic Model of {{TCP Reno}} Congestion Avoidance and Control}

test/fixtures/export/Capitalize all title-fields for language en #383.json
en      : [Effect of immobilization on catalytic characteristics of saturated Pd-N-heterocyclic carbenes in Mizoroki-Heck reactions]
biblatex: {Effect of Immobilization on Catalytic Characteristics of Saturated {{Pd-N-heterocyclic}} Carbenes in {{Mizoroki-Heck}} Reactions}

test/fixtures/export/Capitalize all title-fields for language en #383.json
fr-FR   : [Estimateur d'un défaut de fonctionnement d'un modulateur en quadrature et étage de modulation l'utilisant]
biblatex: {Estimateur d'un défaut de fonctionnement d'un modulateur en quadrature et étage de modulation l'utilisant}

test/fixtures/export/Capitalize all title-fields for language en #383.json
en      : [High-speed Digital-to-RF converter]
biblatex: {{{High-speed Digital-to-RF}} Converter}

test/fixtures/export/Capitalize all title-fields for language en #383.json
en      : [Pleistocene <i><span class="nocase">Homo sapiens</span></i> from Middle Awash, Ethiopia]
biblatex: {Pleistocene \emph{{{Homo sapiens}}} from {{Middle Awash}}, {{Ethiopia}}}

test/fixtures/export/Capitalize all title-fields for language en #383.json
<none>  : [Some remarks on <span class="nocase">’t Hooft’s</span> S-matrix for black holes]
biblatex: {Some Remarks on {{’t Hooft’s}} {{S-matrix}} for Black Holes}

test/fixtures/export/Sorting and optional particle handling #411.json
pt      : [Catalogo dos livros, que se haõ de ler para a continuaçaõ do diccionario da lingua Portugueza: mandado publicar pela Academia Real das Sciencias de Lisboa]
biblatex: {Catalogo dos livros, que se haõ de ler para a continuaçaõ do diccionario da lingua Portugueza: mandado publicar pela Academia Real das Sciencias de Lisboa}

test/fixtures/export/Sorting and optional particle handling #411.json
<none>  : [In memoriam, na cidade]
biblatex: {In Memoriam, Na Cidade}

test/fixtures/export/(non-)dropping particle handling #313.json
<none>  : [<span class="nocase">(abbé d' Aubignac) (François Hédelin)</span>]
biblatex: {{{(abbé d' Aubignac) (François Hédelin)}}}

test/fixtures/export/(non-)dropping particle handling #313.json
<none>  : [<span class="nocase">(Aubignac) (François Hédelin, abbé d')</span>]
biblatex: {{{(Aubignac) (François Hédelin, abbé d')}}}

test/fixtures/export/(non-)dropping particle handling #313.json
en      : [Norm and Action. A Logical Enquiry]
biblatex: {Norm and {{Action}}. {{A Logical Enquiry}}}

test/fixtures/export/(non-)dropping particle handling #313.json
en      : [Reading HLA Hart's: <i>The Concept of Law</i>]
biblatex: {Reading {{HLA Hart}}'s: \emph{{{The Concept}} of {{Law}}}}

test/fixtures/export/(non-)dropping particle handling #313.json
<none>  : [Citations, Out of the Box]
biblatex: {Citations, {{Out}} of the {{Box}}}

test/fixtures/export/(non-)dropping particle handling #313.json
<none>  : ["I have a dream" : the quotations of Martin Luther King JR]
biblatex: {"{{I}} Have a Dream" : The Quotations of {{Martin Luther King JR}}}

test/fixtures/export/key migration.json
<none>  : [The physical volcanology of the 1600 eruption of Huaynaputina, southern Peru]
biblatex: {The Physical Volcanology of the 1600 Eruption of {{Huaynaputina}}, Southern {{Peru}}}
njbart commented 8 years ago

bibtex: Of course we want case folding for titles of English items when using bibtex, but we don’t want that for non-English items. Since bibtex, unlike biblatex, does not have anything like a langid field and no mechanism for making case conversion language-dependent, the easiest solution is to protect all titles etc. of non-English items, wholesale, by wrapping the complete title in curly braces, one pair being sufficient.

With very few specific exceptions, one pair of curly braces is sufficient for protecting against case conversion.

The exceptions are strings that start with a latex command:

  1. \emph{Blah} as is inhibits case conversion,
  2. {\emph{Blah}} enables case conversion, and
  3. {{\emph{Blah}}} again inhibits case conversion.

This seems to be true for both bibtex and biblatex.

I tend to think that previously we considered 3., but not 2.; – and since 3. requires two pairs, I think you found it easier to use two pairs everywhere.

However, now I guess a better strategy (for English items only, of course) would be to wrap, in a first step, all latex commands in one pair of braces, and in a second step, on top of that, wrap everything that needs to be protected against case conversion (either because in Zotero it is inside <span class="nocase">...</span> or because in contains at least one uppercase character) in a (possibly another) pair of braces.

retorquere commented 8 years ago

Oh, right, I didn't get that: protect case-conversion-sensitive fields wholesale for Better BibTeX unless the language is English.

The second part, that staged solution is non-trivial to implement, since the generator would have to be context aware for chunks of text. If the current solution is adequate, that has my preference for pragmatic reasons.

njbart commented 8 years ago

Ok, both Better BibTeX and Better BibLaTeX export a Zotero title This is really, <i>really</i> good (both with an empty language field and with language en-US) to title = {This is really, {\emph{really}} good}, – so the number of braces looks good to allow style-dependent case conversion, and hence it seems you wouldn’t have to change the current solution – but upon export this title should also have been converted to title case, i.e., title = {This Is Really, {\emph{Really}} Good},.

retorquere commented 8 years ago

Wait, ugh, now I see case # 2. Jeez. I take it the same goes for \textsc{...}, \\textbf{...}, \\textsuperscript{...},\textsubscript{...}`? So the proper solution is then to export...to{\emph{...}}, and protected text to{{....}}`?

Currently, This is really, <i>really</i> good translates to {This Is Really, \emph{Really} Good}, but in my current understanding this should instead be {This Is Really, {\emph{Really}} Good}.

retorquere commented 8 years ago

You can fiddle around online with my HTML(ish) to Bib(La)TeX converter here if you want. The latest changes that use {\emph{...}} have been included.

retorquere commented 8 years ago

The titlecaser is straight from citeproc-js BTW, so it returns whatever it cooks up. I'm not universally happy with its results though:

  1. How to derive "Ought" from "Is" doesn't titlecase derive
  2. Bodies that matter on the discursive limits of "sex" title-cases of
  3. The city of tomorrow and its {planning} doesn't titlecase planning
  4. Everything from this list
njbart commented 8 years ago

I see the same effects for 1–3 when creating formatted bibliographies in a titlecase style from Zotero (by right-clicking on a Zotero item, choosing “Create Bibliography from Item …”, then choosing one of the Chicago styles / “Bibliography” / “Copy to Clipboard”, and finally pasting into a word processor to see the result) – so all of these seem to be citeproc-js, not BBT bugs, most likely having something to do with the quotes in the vicinity …

retorquere commented 8 years ago

I've added a simple (bordering on moronic) titlecaser as an option on the test page. The bug in citeproc is longstanding -- I assume it's a hard problem to take on, and not likely to be fixed anytime soon.

njbart commented 8 years ago

Just for the record – pandoc, using CSL JSON data, handles 1–3 perfectly well …

retorquere commented 8 years ago

As far as I can tell, this is the pandoc titlecaser. I don't know any haskell, but it doesn't look all that different from what my simple titlecaser does.

(edit: link updated)

njbart commented 8 years ago

There are also some issues with hyphenated compounds, and some differences between BBT (using biblatex and citeproc-js title caser export) and citeproc-js itself (in combination with a title-case style).

The Chicago Manual of Style, 16th edition, advises (8.159 Hyphenated compounds in headline-style titles):

  1. Always capitalize the first element.
  2. Capitalize any subsequent elements unless they are articles, prepositions, coordinating conjunctions (and, but, for, or, nor), or such modifiers as flat or sharp following musical key symbols.
  3. If the first element is merely a prefix or combining form that could not stand by itself as a word (anti, pre, etc.), do not capitalize the second element unless it is a proper noun or proper adjective.
  4. Capitalize the second element in a hyphenated spelled-out number (twenty-one or twenty-first, etc.) or hyphenated simple fraction (two-thirds in two-thirds majority). This departure from previous Chicago recommendations recognizes the functional equality of the numbers before and after the hyphen.

citeproc-js itself handles most of these correctly, though for 2.’s flat or sharp, and 3. it seems users need to intervene and insert <span class="nocase"> where required.

Now BBT (using the citeproc-js title caser) looks a little inconsistent:

Energy-Efficiency, Multi-Core look good, but

Pre-harvest, Pd-N-heterocyclic, S-matrix, High-speed don’t (expected, and citeproc-js itself: Pre-Harvest, Pd-N-Heterocyclic, S-Matrix, High-Speed).

Also, best practice would suggest High-speed digital-to-RF converter (lowercase ‘d’) in Zotero.

As to (Liquid+liquid) equilibrium of {water+phenol+(1-butanol, or 2-butanol, or tert-butanol)} systems, I would have expected

({{Liquid}}+Liquid) Equilibrium of \{Water+Phenol+(1-Butanol, or 2-Butanol, or Tert-Butanol)\} Systems.

(To get the correct form, “tert-Butanol”, one would also have to italicise and protect the “tert” in Zotero:

(Liquid+liquid) equilibrium of {water+phenol+(1-butanol, or 2-butanol, or <span class="nocase"><i>tert</i></span>-butanol)} systems, output: ({{Liquid}}+Liquid) Equilibrium of \{Water+Phenol+(1-Butanol, or 2-Butanol, or {{\emph{tert}}}-Butanol)\} Systems.)

retorquere commented 8 years ago

I've changed the parser so a - is a word-breaker (it was a word-char before), and that gets me the output you describe.

In High-speed digital-to-RF converter, the citeproc titlecaser upper-cases Digital, so that's not of my doing.

Same goes for (Liquid+liquid) equilibrium of {water+phenol+(1-butanol, or 2-butanol, or tert-butanol)}, really.

retorquere commented 8 years ago

Holy crap the Chicago Manual of Style isn't cheap.

njbart commented 8 years ago

In High-speed digital-to-RF converter, the citeproc titlecaser upper-cases Digital, so that's not of my doing.

What I meant is that the input format = Zotero title field content in this case should be High-speed digital-to-RF converter – expected output: High-Speed Digital-to-{{RF}} converter.

Same goes for (Liquid+liquid) equilibrium of {water+phenol+(1-butanol, or 2-butanol, or tert-butanol)}, really.

I know, citeproc-js bug, this time apparently in the vincinity of “+”.

retorquere commented 8 years ago

What I meant is that the input format = Zotero title field content in this case should be High-speed digital-to-RF converter – expected output: High-Speed Digital-to-{{RF}} converter.

Right, check and done.

So is 8.157 from the CMoS the "algorithm" I would best follow for title casing? I suspect there's a million exceptions, but I could give it a go.

retorquere commented 8 years ago

(ugh -- 3 - 6 would require rather fancy natural language processing, and those libraries are yooge. Candidates for future reference: https://github.com/josephmisiti/awesome-machine-learning#javascript-nlp)

njbart commented 8 years ago

So is 8.157 from the CMoS the "algorithm" I would best follow for title casing? I suspect there's a million exceptions, but I could give it a go.

No, really, I did not want to suggest that at all. (The CMoS reference was more intended to point out the “Capitalize any subsequent elements …” rule rather than its exceptions.)

I think the existing title casers (citeproc-js, and BBT, based on citeproc-js) work reasonably well – though of course the known bugs should be fixed.

With the existing title casers, users who need “The E-flat Concerto” or “Anti-intellectual Pursuits” in their formatted titlecase-style bibliographies have to input The E-<span class="nocase">flat</span> concerto and Anti-<span class="nocase">intellectual</span> pursuits in Zotero – a bit ugly, but I don’t think this comes up frequently enough to justify rewriting the title caser.

retorquere commented 8 years ago

Fair enough. It'd be nice if all this could be hidden behind visual editing rather than hand-editing HTML codes, but that's a matter for another day.

So, is the current implementation good to go then?

retorquere commented 8 years ago

Just to verify -- all of emph, textsc, textbf, textsuperscript and textsuperscript get encoded as {\<command>{...}} rather than \<command>{...}, correct? Or is emph somehow special?

retorquere commented 8 years ago

And The Largest {{U}}.{{S}}. States has become The Largest {{U.S}}. States, is that OK? Easily changed (I just make an inner-period not a word char), but good to know.

njbart commented 8 years ago

Just to verify -- all of emph, textsc, textbf, textsuperscript and textsuperscript get encoded as {{...}} rather than {...}, correct?

Correct. (see https://www.sharelatex.com/project/57b33143430c01877dce1177)

retorquere commented 8 years ago

So what are the rules again for BibTex for title-like fields? Is it:

  1. If English, apply title case + case preservation
  2. if not, double-brace the whole field

or

  1. If English, apply case preservation
  2. if not, double-brace the whole field

or

  1. If English, do nothing
  2. if not, double-brace the whole field
retorquere commented 8 years ago

(I don't have access to that sharelatex project. Also, same for \enquote yeah?)

njbart commented 8 years ago

For BibTex it’s:

AFAICT, for both BibLaTeX and BibTeX, if English, titleCasing + casePreservation always coincide. (Do you have an example where the two might differ?)

access to that sharelatex project

fixed

retorquere commented 8 years ago

Nope, I have no examples, but I just got lost in the discussion, so I needed a recap. The current BibTeX algo is wrong for English, fixing that now.

njbart commented 8 years ago

… same for \enquote …?

Correct. (Again, see https://www.sharelatex.com/project/57b33143430c01877dce1177)

retorquere commented 8 years ago

Aw yeah, most tests pass! A little tinkering, tomorrow should be all-green.

njbart commented 8 years ago

I’d change the example that started this thread from (Zotero title field)

Effects of open- and closed-system temperature changes on blood O<sub>2</sub>-binding characteristics of Atlantic bluefin tuna (<i>Thunnus thynnus</i>)

to

Effects of open- and closed-system temperature changes on blood O<sub>2</sub>-binding characteristics of Atlantic bluefin tuna (<span class="nocase"><i>Thunnus thynnus</i></span>)

No biologist would ever write (or want to have to read) Thunnus Thynnus rather than the correct and, case-wise, unchangeable Thunnus thynnus.

retorquere commented 8 years ago

What about {{U}}.{{S}}. vs {{U.S}}.?

njbart commented 8 years ago

No preference. (I can’t even see any differences in kerning.)

retorquere commented 8 years ago

In that case, {{U}}.{{S}} is easier to generate safely than {{U.S}}

Do single letters even need protection btw?