jgm / pandoc-citeproc

Library and executable for using citeproc with pandoc
BSD 3-Clause "New" or "Revised" License
291 stars 61 forks source link

Preserve case of all titles #269

Closed sjackman closed 7 years ago

sjackman commented 7 years ago

I'm using this CSL file: http://www.zotero.org/styles/genome-research. It results in all paper titles being converted to sentence case, that is, all capital letters except the first are changed to lowercase laters. I'd like to preserve the case as it is in my .bib file. I read the README.md about using {foo} to protect case. Is there an option to preserve the case of all titles? As a workaround I changed each title={…} to title={{…}}, which worked. The workaround is easy enough, so no worries if there is no such option.

sjackman commented 7 years ago

My motivation for this feature request is that I generate my .bib database automatically from a list of DOIs using doi.org. That data source does not include protection of proper nouns using {…}. For example, the proper noun Bruijn is not protected below.

❯❯❯ curl -LH "Accept: text/bibliography; style=bibtex" "http://dx.doi.org/10.1007/978-3-319-05269-4_4"
 @article{Chikhi_2014, title={On the Representation of de Bruijn Graphs}, ISBN={http://id.crossref.org/isbn/978-3-319-05269-4}, ISSN={1611-3349}, url={http://dx.doi.org/10.1007/978-3-319-05269-4_4}, DOI={10.1007/978-3-319-05269-4_4}, journal={Research in Computational Molecular Biology}, publisher={Springer Science + Business Media}, author={Chikhi, Rayan and Limasset, Antoine and Jackman, Shaun and Simpson, Jared T. and Medvedev, Paul}, year={2014}, pages={35–55}}
jgm commented 7 years ago

Actually the "untitlecase" feature in the BibTeX reader is active only when lang is "en". So if you switch your lang to something else (either with LANG or in the csl file locale information) it won't do the case transform. But that's probably not useful advice if you're writing in English.

I note that Text.CSL.Input.Bibtex exports

readBibtexString :: Bool -> Bool -> String -> IO [Reference]
readBibtexString isBibtex caseTransform contents = do

What you'd need is a way to set 'caseTransform' to False. But the only place this function is called, in Text.CSL.Input.Bibutils, the parameter is set to True.

We could think about making this user-configurable, since the hooks for this are there. But so far we've been trying to imitate BibTeX's and BibLaTeX's behavior.

+++ Shaun Jackman [Nov 23 16 16:09 ]:

I'm using this CSL file: [1]http://www.zotero.org/styles/genome-research. It results in all paper titles being converted to sentence case, that is, all capital letters except the first are changed to lowercase laters. I'd like to preserve the case as it is in my .bib file. I read the README.md about using {foo} to protect case. Is there an option to preserve the case of all titles? As a workaround I changed each title={…} to title={{…}}, which worked. The workaround is easy enough, so no worries if there is no such option.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, [2]view it on GitHub, or [3]mute the thread.

References

  1. http://www.zotero.org/styles/genome-research
  2. https://github.com/jgm/pandoc-citeproc/issues/269
  3. https://github.com/notifications/unsubscribe-auth/AAAL5OL6UheVi2S86Qh43Ia8ddETe0aaks5rBNW3gaJpZM4K7KYd
njbart commented 7 years ago

Actually the "untitlecase" feature in the BibTeX reader is active only when lang is "en". So if you switch your lang to something else (either with LANG or in the csl file locale information) it won't do the case transform.

IIRC, pandoc-citeproc's "untitlecase" feature does not pay attention to LANG or csl file locale information - nor should it, since the conversion only makes sense on a per-entry basis.

What pandoc-citeproc does, correctly, is to look for per-entry information in the langid field (no "untitlecase" when langid contains something other than english [or variants; for details, see the biblatex manual]).

Hence adding langid={xx} to problematic entries should work just as well as changing title={…} to title={{…}}.

... so far we've been trying to imitate BibTeX's and BibLaTeX's behavior.

And this is precisely why I would not encourage making this user-configurable.

jgm commented 7 years ago

+++ Nick Bart [Nov 24 16 01:30 ]:

IIRC, pandoc-citeproc' "untitlecase" feature does not pay attention to LANG or csl file locale information - nor should it, since the conversion only makes sense on a per-entry basis.

What pandoc-citeproc does, correctly, is to look for per-entry information in the langid field (no "untitlecase" when langid contains something other than english [or variants; for details, see the biblatex manual]).

Hence adding langid={xx} to problematic entries should work just as well as changing title={…} to title={{…}}.

... so far we've been trying to imitate BibTeX's and BibLaTeX's
behavior.

Thanks for the clarification, I had forgotten that!

And this is precisely why I would not encourage making this user-configurable.

Still, I'm not sure there's no point to accommodating things like doi.org's broken bibtex entries. You might think that the answer is to get doi.org to fix their bibtex entries. But I suspect the problem is that they can't. I assume they've got these entries stored in their database as regular titlecase strings, probably just taken from the journal itself. In "On the Representation of de Bruijn Graphs," there is nothing that tells you that the R in "Representation" is just capitalized because of titlecase while the B in "Bruijn" needs to stay capitalized. So they really couldn't return something like

{On the Representation of de {B}ruijn Graphs}

even if they wanted to. Of course, they could return

{{On the Representation of de Bruijn Graphs}}

but that wouldn't be ideal either, since we need to support bib formats that use sentence case. (On the other hand, avoiding untitlecase would have the same drawback.)

njbart commented 7 years ago

I'd say that, for the time being, curating your own database is the only answer if you want high-quality metadata. So far, publishers, and crossref etc. are just too unreliable: I've seen all kinds of things in titles from crossref: title-case, sentence-case, missing titles, titles in all-caps, etc.

njbart commented 7 years ago

Just to illustrate this a bit further: The following are a small selection from a random sample of crossref entries (obtained via 2x curl http://api.crossref.org/works?sample=100). From this selection, none can be used as-is, for reasons that amount to more than just title case vs. sentence case issues. My estimate is that about one in ten crossref entries is either affected by such problems, or does not contain a title at all, and there's nothing pandoc could possibly do about these. So my view is, why then bother with a band-aid solution just for case conversion?

" Breeding biology of the Flesh-footed Shearwater ( Puffinus carneipes ) on Woody Island, Western Australia "
"PROGRESSIVE REFRACTION CHANGES FOLLOWING TREPHINE OPERATION"
"THE CLASSIFICATION AND NAMING OF TREES"
"CONTROL OF PEAR PSYLLA WITH DIFLUBENZURON AND PYRIPROXYFEN, 2000"
" Steady-state convection and fluctuation-driven particle transport in the H -mode transition "
"Il Regio Museo Archeologico nel Palazzo Reale di Venezia. Di C. Anti. Roma: La Libreria dello Stato, 1930. Pp. 179, with 61 illustrations. L. 12."
"Bright solitons for the (<mml:math altimg=\"si21.gif\" overflow=\"scroll\" xmlns:xocs=\"http://www.elsevier.com/xml/xocs/dtd\" xmlns:xs=\"http://www.w3.org/2001/XMLSchema\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns=\"http://www.elsevier.com/xml/ja/dtd\" xmlns:ja=\"http://www.elsevier.com/xml/ja/dtd\" xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:tb=\"http://www.elsevier.com/xml/common/table/dtd\" xmlns:sb=\"http://www.elsevier.com/xml/common/struct-bib/dtd\" xmlns:ce=\"http://www.elsevier.com/xml/common/dtd\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" xmlns:cals=\"http://www.elsevier.com/xml/common/cals/dtd\" xmlns:sa=\"http://www.elsevier.com/xml/common/struct-aff/dtd\"><mml:mrow><mml:mn>2</mml:mn><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math>)-dimensional coupled nonlinear Schrödinger equations in a graded-index waveguide"
"<title>Femtosecond subsurface photodisruption in scattering human tissues using long infrared wavelengths</title>"
" MnO 2 –Au Composite Electrodes for Supercapacitors "
"Molecular orbital study of H[sub 2] and CH[sub 4] activation on small metal clusters. I. Pt, Pd, Pt[sub 2], and Pd[sub 2]"
"Interaction of [Rh(CO)2Cl]2 with O2 oxidized Al(100): Effect of Al2O3 preparation on [Rh(CO)2Cl]2 decomposition"
"Spin-(3/2 gravitational trace anomaly"
"Civil War Intervention and the Problem of Iraq1"
"Information_for_Authors"
"The Effect of Angiotensin ii on Human Mononuclear Cell Reactivity: Suppression of Pha-P-Induced Thymidihe Incorporation"
"THE ROLE OF EXTERNAL SODIUM IN SEA URCHIN FERTILIZATION11This work was supported by a grant from the National Science Foundation to Dr. D. Epel."
"THE LINGUISTIC ATLAS OF THE UPPER MIDWEST AS A SOURCE OF SOCIOLINGUISTIC INFORMATION11This is a revised version of a paper orally presented at the regional meeting of the American Dialect Society in St. Louis, Missouri, November 4, 1976."
sjackman commented 7 years ago

The work around is easy enough for me. Feel free to close this issue as wontfix if you prefer.

sjackman commented 5 years ago

This issue is still relevant to me. In case anyone stumbles on this issue, here's a simple sed for the work around of changing title={…} to title={{…}}.

# Concatentate the citations with and without DOI.
# Preserve title case.
%.bib: %.doi.bib %.other.bib
    sort $^ \
    | sed -E \
        -e 's/title={([^}]*)},/title={{\1}},/' \
        -e 's~http://dx.doi.org~https://doi.org~' \
        >$@
ghost commented 5 years ago

Hi sjackman, how to use your code to work? I am new to this. I just started use pandoc and need to preserve the title too. Thanks.