jgm / pandoc-citeproc

Library and executable for using citeproc with pandoc
BSD 3-Clause "New" or "Revised" License
291 stars 61 forks source link

Converting BIB {{title}} generates "[title]{.nocase}" in JSON, but it gets passed to output #302

Closed iandol closed 7 years ago

iandol commented 7 years ago

My reference manager generates BIB files that "protect" the titlecase like so:

title = {{Piercing of Consciousness as a Threshold-Crossing Operation}}, 

If I use pandoc-citeproc -j to convert the BIB to JSON, I get this:

"title": "[Piercing of Consciousness as a Threshold-Crossing Operation]{.nocase}",

When I generate a bibliography using the JSON, then I get the [...]{.nocase} being passed into the final bibliography text. I've always used BIB with pandoc-citeproc but it is >3X slower so have just switched to JSON. Not sure where the problem is (should citeproc handle this?), or whether I should just preparse the BIB to remove the {{ … }} case protection before generating the JSON?

jgm commented 7 years ago

Perhaps we could have pandoc-citeproc strip the "nocase" Span from the final output, where it's not necessary.

njbart commented 7 years ago

As a matter of principle, I think we should follow both the official CSL specs and additional de facto standards, most importantly those introduced by citeproc-js.

This means we should refrain from using

"title": "[Piercing of Consciousness as a Threshold-Crossing Operation]{.nocase}",

when generating CSL JSON and CSL YAML, in favour of

"title": "<span class="nocase">Piercing of Consciousness as a Threshold-Crossing Operation</span>",

– or else, amongst others, Zotero won’t be able to import and citeproc-js won’t be able to parse such data.

That being said, I doubt whether it’s such a good idea to routinely protect whole fields against case transformation – but that’s a problem of the OP’s unnamed “My reference manager”.

jgm commented 7 years ago

+++ Nick Bart [Sep 13 17 09:04 ]:

As a matter of principle, I think we should follow both the [1]official CSL specs and additional de facto standards, most importantly those introduced by [2]citeproc-js.

This means we should refrain from using

"title": "[Piercing of Consciousness as a Threshold-Crossing Operation]{.nocase}",

when generating CSL JSON and CSL YAML, in favour of

"title": "Piercing of Consciousness as a Threshold-Crossing Operation",

Agreed. This would only require disabling a pandoc markdown option when generating these things.

jgm commented 7 years ago

I've changed things so that bracketd spans are never used.

ALso, we shouldn't be getting <span class="nocase"> tags in the final bibliography output. I couldn't reproduce that part of the report; maybe this has been fixed in later versions? I'll close this, but feel free to submit a new bug report with detailed instructions for reproducing the problem, if there is an issue.

iandol commented 7 years ago

Actually my reference manager (Bookends for macOS) doesn't protect the title by default on export to BibTeX. But I do use a post-processing script as I've not quite managed to ever get CSL styles to do what I want and don't have much time to invest in trying to understand the CSL standard. I have a set of domain specific scientific terms whose case should not be changed, and I do not want to introduce CSL specific hacks like HTML spans into the reference database itself, only the JSON file used for citeproc. Any advice welcome.

Anyway, thanks Jon for fixing this issue!

njbart commented 7 years ago

Any advice welcome.

I don’t know whether Bookends provides any mechanism for that, but if you can get it to wrap just the “domain specific scientific terms whose case should not be changed” in a pair of curly braces for bib(la)tex output (e.g., A Short History of {Homo sapiens}), or <span class="nocase"></span> for CSL JSON output (e.g., A short history of <span class="nocase">Homo sapiens</span>), that’d be much better in terms of flexibility, i.e., allowing you to correctly format the output for both sentence-case and title-case styles without having to change the biblio data. (Note, however, that in the biblio source, bib(la)tex titles need to appear in title case and CSL JSON titles in sentence case for this to work.)

iandol commented 7 years ago

@njbart many thanks for this. I've written a ruby script to protect just keywords in the BibTeX intermediate file.

So in my workflow of Bookends > BibTeX intermediate > JSON > pandoc-citeproc, I should try to ensure my source Bookends titles are sentence case if my final output will always go through pandoc-citeproc? Or does BibTeX > JSON convert from title to sentence case, in which case my source should be title case?

njbart commented 7 years ago

Or does BibTeX > JSON convert from title to sentence case, in which case my source should be title case?

If Bookends does not provide options for case conversion (Zotero, e.g., does not either, but its BBT addon does), and if the workflow you describe is the only one you are planning to use, I’d say yes.

pandoc-citeproc -j does indeed convert bib(la)tex titles from title case to sentence case (see the man page).

If you are planning to use other workflows as well that would require titles in Bookends to be entered in sentence case, I’d be more hesitant. In this case, some biblatex/biber trickery might do the trick, see, e.g., here.

iandol commented 7 years ago

Bookends does allow transformation of case for titles on export. But because some of my terms like alpha should be guarded in some titles but not others I have another layer of complexity. I tend to manually title case in Bookends, and set specific case for some terms (ALPHA vs. Alpha), then when generating the BibTeX I tell bookends to preserve the case as saved in the database.

You've been super helpful, thank you!