jgm / citeproc

CSL citation processing library in Haskell
BSD 2-Clause "Simplified" License
154 stars 17 forks source link

Allow inline formatting in locators? #68

Open badumont opened 3 years ago

badumont commented 3 years ago

With citeproc 0.3.0.9, when compiling the following MWE, the smallcaps in the "section" locator are stripped from the output:

---
suppress-bibliography: true
references:
- type: book
  id: CaesarGallic
  author:
  - literal: Julius Caesar
  title: Bellum Gallicum
---

[@CaesarGallic, {section XI, [iv]{.smallcaps}, 3}, p. 59]

Output (with acta-philosophica.csl):

pandoc -t plain --citeproc --csl=acta-philosophica.csl test.md
[1]

[1] JULIUS CAESAR, Bellum Gallicum, secs. XI, iv, 3, p. 59

However, if I set the output format to native, I can see that iv is wrapped in a SmallCaps object in the value of the citationSuffix property. It is only set to a plain string in the content of the Cite object.

Now, if I modify the body of my markdown file like this:

^[@CaesarGallic [section XI, [iv]{.smallcaps}, 3], p. 59.]

The formatting is retained:

pandoc -t plain --citeproc --csl=acta-philosophica.csl test.md
[1]

[1] JULIUS CAESAR, Bellum Gallicum, sec. XI, IV, 3, p. 59.

It can also be seen that the locator label is plural in the first case and singular in the second.

jgm commented 3 years ago

You can see why this happens from the types:

-- | The part of a citation corresponding to a single work,
-- possibly including a label, locator, prefix and suffix.
data CitationItem a =     
  CitationItem   
  { citationItemId             :: ItemId
  , citationItemLabel          :: Maybe Text
  , citationItemLocator        :: Maybe Text
  , citationItemType           :: CitationItemType
  , citationItemPrefix         :: Maybe a
  , citationItemSuffix         :: Maybe a
  } deriving (Show, Eq, Ord)

Locator is a plain string, whereas prefix and suffix can be formatted. This representation makes it much easier for us to manipulate locators. I don't know, actually, whether the CSL spec says that formatting should be allowed on locators -- I had thought not, but I may be wrong. @denismaier @bdarcus do you know?

badumont commented 3 years ago

Thank you for your answer. I cant find anything about this in the specification. But anyway, I don't want to argue for this or that solution, but to point to the inconsistency across citation modes.

If you choose not to permit formatting inside locators, could it be noted in the manual?

jgm commented 3 years ago

It's not an inconsistency across citation modes. The difference is that in your first case you've explicitly marked something as a locator using {},

[@CaesarGallic, {section XI, [iv]{.smallcaps}, 3}, p. 59]

(oddly excluding the page?) while in the second

[section XI, [iv]{.smallcaps}, 3], p. 59.]

you haven't done this. Since pandoc's heuristics for locators don't detect this as one, and you don't use the {}, it is treated as a suffix (thus permitting formatting). I suspect that if you use the {} syntax around the whole locator in this case, you'll see the same thing as in the first case.

bdarcus commented 3 years ago

I haven't checked, but am pretty sure we're silent on that question ATM.

jgm commented 3 years ago

I could change it to allow formatted content, but conceptually this seems like something that should have a solution at the style level (some styles will want to format roman-numeral locators with small caps, others with large caps, etc.).

bdarcus commented 3 years ago

... this seems like something that should have a solution at the style level?

Yeah, I can see that. I just don't recall it coming up.

Thoughts on this @bwiernik?

badumont commented 3 years ago

Sorry, I thought that in @baz [chap. 1], the brackets were intended to enclose the locator, like the curly braces in normal citation mode. I understand now.

I excluded the page because CSL only supports one locator, so I had to format it myself.

In this case one can set all the locator to small caps, so it is not so big a problem. It would be if one had to put some part in italics (like prooem.). Since CSL handles the locator in a monolithic way, it can't be supported by the style.

badumont commented 3 years ago

It would also be useful to print folios like f. 35v.

jgm commented 3 years ago

In your case the best workaround is probably to manually format the locators. You just need to block pandoc from treating them as locators; I think you could do that using something like

[@CaesarGallic, {}section XI, [iv]{.smallcaps}, 3, p. 59]

(untested)

denismaier commented 3 years ago

I'll have to check in the specs, but Zotero allows formatting for locators.

grafik

denismaier commented 3 years ago

Ok, in the spec locator is currently listed under standard variables, and this is what the spec says:

locator a cite-specific pinpointer within the item (e.g. a page number within a book, or a volume in a multi-volume work). Must be accompanied in the input data by a label indicating the locator type (see the Locators term list), which determines which term is rendered by cs:label when the “locator” variable is selected.

In my understanding that means that the locator should not be treated differently than any other variable, the label mechanism aside.

jgm commented 3 years ago

OK, resolved then to change this to allow formatted locators.

jgm commented 3 years ago

Tricky aspects of this: currently we substitute the and term for & in locators (Eval.hs, l. 1440). We'd need a way to do this that works with any kind of formatted type. [EDIT: This should be easy using mapText.]

formatPageRange (l. 1394) also does some string manipulation on the locator. [This will be trickier.]

What makes this harder in citeproc is that citeproc is polymorphic on the output format -- it could be any structured type that instantiates a certain class (CiteprocOutput), so ALL we can use are the methods defined for that class. We may need to add new methods to allow these operations.

[EDIT: Changes to pandoc would also be needed:
parseLocator in T.P.Citeproc.Locator would be modified to return [Inline] instead of Text for the locator.]

bwiernik commented 3 years ago

I don't see why we would need any specific support at the style level--I think we should just be able to apply the standard inline text formatting to locator contents, as appears to have been implemented.

bwiernik commented 3 years ago

What I meant is that a style might want to specify, for example, that all locator labels are small caps. That can't be done currently.

Oh I see. This should be addressable with the same new syntax that will be needed to style multiple locators. https://github.com/citation-style-language/schema/issues/342

jgm commented 3 years ago

Actually I think you're right that this can be handled in the regular way.

bwiernik commented 3 years ago

The locators contents could be handled in the regular way. For formatting of locator labels, that would need to be in a style, which would require something like the <locator> structure I linked to.

jgm commented 3 years ago

Ah, okay.

badumont commented 2 years ago

I don't think that it is worth opening a new issue, so I add it here: the same problem arise with name variables, especially when citing works written by kings, emperors, popes or bishops in languages where the ordinal suffix should be in superscript (such as "Justinien Ier").

Again, the specifications are silent about it, but Zotero does parse HTML-like markup in name fields.

bwiernik commented 2 years ago

From my perspective, markup should be allowed in any variable

jgm commented 2 years ago

Fortunately you can still do "Iᵉʳ". The locale file uses unicode superscripted characters:

    <term name="ordinal-01" gender-form="feminine" match="whole-number">ʳᵉ</term>
    <term name="ordinal-01" gender-form="masculine" match="whole-number">ᵉʳ</term>
jgm commented 2 years ago

Btw, the issue for names is #63.