jgm / citeproc

CSL citation processing library in Haskell
BSD 2-Clause "Simplified" License
147 stars 13 forks source link

How could citeproc support CSL-M layout? #120

Open EroyalBoy opened 1 year ago

EroyalBoy commented 1 year ago

When i make markdown to word by Pandoc , I met a question that i used CSL from (https://github.com/redleafnew/Chinese-STD-GB-T-7714-related-csl) ,then the references part in word are abnormality, which all cite twice such as image

But i use this CSL in word by zotero macro ,which will be normal.

jgm commented 1 year ago

There's not enough information here for me to diagnose the issue. If you attach files that suffice to reproduce the issue (together with the command used), I could take a look. However, if the question is "does citeproc support CSL-M specific features?" then the answer is simply no. We only support standard CSL.

EroyalBoy commented 1 year ago

https://github.com/redleafnew/Chinese-STD-GB-T-7714-related-csl

the question is from https://github.com/redleafnew/Chinese-STD-GB-T-7714-related-csl/issues/85#issuecomment-1286702538 your can try to use CSL from (Chinese-STD-GB-T-7714-related-csl) to test issue when markdown transform to word by pandoc

jgm commented 1 year ago

I'm sorry, I can't read Chinese. Can you give the complete pandoc command you're using to generate the output above, using the files in the linked repository?

EroyalBoy commented 1 year ago

I'm sorry, I can't read Chinese. Can you give the complete pandoc command you're using to generate the output above, using the files in the linked repository?

yes,i use this linked csl and command,like this

--citeproc --bibliography=/Users/chrisfang/文档/zotero/我的文库.bib --csl=/Users/chrisfang/Desktop/模板/CSL/main/009gb-t-7714-2015-numeric-bilingual-no-uppercase-page-out.csl

EroyalBoy commented 1 year ago

009gb-t-7714-2015-numeric-bilingual-no-uppercase-page-out.csl

this csl that You can also get from the linked https://github.com/redleafnew/Chinese-STD-GB-T-7714-related-csl/blob/main/009gb-t-7714-2015-numeric-bilingual-no-uppercase-page-out.csl

jgm commented 1 year ago

And where can I find 我的文库.bib?

zepinglee commented 1 year ago

Almost all citation styles in China are multilingual ones (mainly Chinese and English) and the cs:layout extension feature from CSL-M is important to implement that. See also https://github.com/Juris-M/citeproc-js/blob/master/fixtures/local/language_BaseLocale.txt. That's why redleafnew/Chinese-STD-GB-T-7714-related-csl is built for styles that cannot go into the official styles repository.

The issue can also be reproduced with the following contents.

main.md:

# Test

A Chinese entry [@ITEM-1].
An English entry [@ITEM-2].

# Bibliography

main.json:

[
    {
        "id": "ITEM-1",
        "type": "book",
        "event-place": "北京",
        "language": "zh",
        "publisher": "中华书局",
        "publisher-place": "北京",
        "title": "国史旧闻",
        "volume": "第 1 卷",
        "author": [
            {
                "family": "陈",
                "given": "登原"
            }
        ],
        "issued": {
            "date-parts": [
                [
                    "2000"
                ]
            ]
        }
    },
    {
        "id": "ITEM-2",
        "type": "book",
        "edition": "4",
        "event-place": "New York",
        "language": "en",
        "publisher": "McGraw-Hill",
        "publisher-place": "New York",
        "title": "Probability, random variables, and random signal principles",
        "author": [
            {
                "family": "Peebles",
                "given": "Peyton Z."
            }
        ],
        "issued": {
            "date-parts": [
                [
                    "2001"
                ]
            ]
        }
    }
]

009gb-t-7714-2015-numeric-bilingual-no-uppercase-page-out.csl is available from https://github.com/redleafnew/Chinese-STD-GB-T-7714-related-csl/blob/main/009gb-t-7714-2015-numeric-bilingual-no-uppercase-page-out.csl.

Command:

pandoc --citeproc --bibliography=main.json --csl 009gb-t-7714-2015-numeric-bilingual-no-uppercase-page-out.csl -o out.html main.md

out.html:

Test

A Chinese entry[1]. An English entry[2].

Bibliography

[1]
陈登原. 国史旧闻: 第 1 卷[M]. 北京: 中华书局, 2000[1]陈登原. 国史旧闻: 第 1 卷[M]. 北京: 中华书局, 2000.
[2]
Peebles P Z. Probability, random variables, and random signal principles[M]. 4 版. New York: McGraw-Hill, 2001[2]Peebles P Z. Probability, random variables, and random signal principles[M]. 4 版. New York: McGraw-Hill, 2001.

Each entry is repeated twice in the output.

denismaier commented 1 year ago

Oh, that's interesting. Pandoc's citeproc renders both layout nodes? Am I reading this correctly?

jgm commented 1 year ago

Thanks for the detailed report. Yes, from Citeproc.Style:

    let layouts = getChildren "layout" node'
    let formatting = mconcat $ map (getFormatting . getAttributes) layouts
    let sorts   = getChildren "sort" node'
    elements <- mapM pElement (concatMap allChildren layouts)

This just presupposes that there is only one layout element (I didn't know there could be more than one---but I guess that's just in CSL-M?). If there are multiple layouts, their children are just concatenated.

It would take some changes in the basic types to support multiple layouts, but I don't think it would be too complicated.

jgm commented 1 year ago

OK, it's not as easy as I thought. I've pushed some preliminary changes to the types and style parsing to the multilayouts branch.

But the problem I'm now facing is that the code for disambiguating, grouping and collapsing is sensitive to some of the options on layout, such as year suffix delimiter, after collapse delimiter, etc. I'm not entirely sure how to handle this if these attributes can vary from one bibliographic item to another, since grouping, collapsing, and disambiguating must be done holistically, not on an item by item basis. Any advice on that?

I'm also not sure I should even be proceeding in this direction. Is there any point to supporting multiple layout elements without supporting the rest of CSL-M?

Perhaps, instead, I should just modify the code so that only the last layout element is used. This would prevent double citations like the ones shown above.

jgm commented 1 year ago

I've made the more moderate change noted above, which at least avoids the doubled citations.

badumont commented 1 year ago

Why not raise an error when there are multiple layouts? After all, multiple layouts are not conformant to the specification targeted by your implementation. The error message would explain users why this does not work, which sounds better to me than falling back silently on a behaviour they don't expect.

jgm commented 1 year ago

Yes, maybe you're right that an error would be better here.

denismaier commented 1 year ago

What about also checking the version attribute on cs:style? On regular CSL 1.0 mode this should raise an error. Yet, perhaps someone might add more features in additional mode.

jgm commented 1 year ago

Unfortunately the CSL-M examples above also have version="1.0".

denismaier commented 1 year ago

Well, then it's clearly against the specification...

njbart commented 1 year ago

FWIW, “CSL-M styles”, aka “Juris-M styles: extended CSL styles with jurisdiction support”, aka “jm-styles“ from the official repository whose names start with jm- all contain <style ... version="1.1mlz1" ...>, and it seems this version string is required to start citeproc-js in CSL-M mode. See also https://discourse.citationstyles.org/t/csl-1-2-planning/1476/6.

EroyalBoy commented 1 year ago

I've made the more moderate change noted above, which at least avoids the doubled citations.

wow, thanks! this is our need!

jgm commented 1 year ago

I've made the more moderate change noted above, which at least avoids the doubled citations.

wow, thanks! this is our need!

Sorry, in the end I took @badumont 's advice and had it issue an error message instead. Since we don't support CSL-M, it would be confusing if we appeared to support it.

CLRafaelR commented 1 year ago

@jgm I would like to ask you some clarification questions:

  1. Are you going to add some modifications to citeproc so that citeproc raises an error if it detects multiple <layout>...</layout> elements in such a csl file as shown below?
<citation>
  <layout locale="en es de">
      <text macro="layout-citation-roman"/>
  </layout>
  <layout locale="ru">
      <text macro="layout-citation-cyrillic"/>
  </layout>
  <layout>
      <text macro="layout-citation-ja"/>
  </layout>
</citation>

(The example above is from here)

  1. Is it impossible that citeproc supports multiple layouts by parsing if-statements that are originally implemented in CSL 1.0 and independent of CSL-m? For example, would citepric conditionally display citations according to the specifications of a csl file that contains the following if-statements?
<citation>
  <choose>
    <if locale="en es de">
      <layout locale="en es de">
        <text macro="layout-citation-roman" />
      </layout>
    </if>
    <else-if locale="ru">
      <layout locale="ru">
        <text macro="layout-citation-cyrillic" />
      </layout>
    </else-if>
    <else>
      <layout>
        <text macro="layout-citation-ja" />
      </layout>
    </else>
  </choose>
</citation>
CLRafaelR commented 1 year ago

I believe that supporting multiple layouts by citeproc will save the life of massive number of writers and researchers, like us, who simultaneously cite documents written in European alphabets and documents written in languages that uses non-European alphabets such as Chinese, Japanese, Korean, and any other huge number of languages.

jgm commented 1 year ago

Are you going to add some modifications to citeproc so that citeproc raises an error if it detects multiple <layout>...</layout> elements in such a csl file as shown below?

Yes, this was done in the commit linked above.

But yes, if you can get what you want using CSL 1.0 conditionals, then that might be a good solution.

See my comments above about why I didn't merge support for multiple layouts.

denismaier commented 1 year ago

2. Is it impossible that citeproc supports multiple layouts by parsing if-statements that are originally implemented in CSL 1.0 and independent of CSL-m? For example, would citepric conditionally display citations according to the specifications of a csl file that contains the following if-statements?

This example won't work. cs:layout isn't an allowed child of cs:if. Also, cs:choose can't occur as a direct child of cs:citation. That would be an extension to/modification of the spec, just as CSL-M is. Not that it wouldnt be useful though.

badumont commented 1 year ago

... but you can't test variables like that in CSL. You can only test for the presence of a variable, not its value.

However, since John's citeproc allows custom variables, you can include language variables in the "Extra" field of each item in Zotero such as "lang-ru: yes" or "lang-en: yes" and rewrite your code like this:

(The "yes" values are never used: they are only here to make the language variables exist. I think that this solution should work, but I have not tested it.)

badumont commented 1 year ago

This example won't work. cs:layout isn't an allowed child of cs:if. Also, cs:choose can't occur as a direct child of cs:citation.

So:

CLRafaelR commented 1 year ago

@badumont Thank you for your clarification.

  1. Is it impossible that citeproc supports multiple layouts by parsing if-statements that are originally implemented in CSL 1.0 and independent of CSL-m? For example, would citepric conditionally display citations according to the specifications of a csl file that contains the following if-statements?

This example won't work. cs:layout isn't an allowed child of cs:if. Also, cs:choose can't occur as a direct child of cs:citation. That would be an extension to/modification of the spec, just as CSL-M is. Not that it wouldnt be useful though.

I have not noticed that cs:layout isn't a child of cs:if and that cs:choose is not a direct child of cs:citation. I might miss when I've read the documentation or these things are undocmuented, perhaps (Which section are these things are noted in?).

denismaier commented 1 year ago

. I might miss when I've read the documentation or these things are undocmuented, perhaps (Which section are these things are noted in?).

https://github.com/citation-style-language/schema/blob/5b8bbc824e026959417757d4ce4012a26b10e637/schemas/styles/csl.rnc#L344

denismaier commented 1 year ago

So:

<citation>
  <layout>
    <choose>
      <if variable="lang-en lang-es lang-de" match="any">
        <text macro="layout-citation-roman" />
      </if>
      <else-if variable="lang-ru">
        <text macro="layout-citation-cyrillic" />
      </else-if>
      <else>
        <text macro="layout-citation-ja" />
      </else>
    </choose>
  </layout>
</citation>

The limitation with this approach is this: CSL-M will not only adapt the layout according to the locale, but also use locale dependent terms. That won't be covered here, unfortunately.

CLRafaelR commented 1 year ago

@denismaier Thank you for sharing me the source of the document and telling me the limitation of the approach.

you can include language variables in the "Extra" field of each item in Zotero such as "lang-ru: yes" or "lang-en: yes"

Actually I do not use Zotero but use bibtex (i.e. .bib files). Is it just sufficient adding a key like lang-en = {yes} to bib items in order to use your solution with .bib files, as shown below?

@article{chen2012,
  title    = {基于电无级变速器的内燃机最优控制策略及整车能量管理},
  author   = {陈骁 and 黄声华 and 万山明 and 庞珽},
  journal  = {电工技术学报},
  volume   = {27},
  number   = {2},
  pages    = {133--138},
  year     = {2012},
  lang-zh  = {yes}
}
badumont commented 1 year ago

The limitation with this approach is this: CSL-M will not only adapt the layout according to the locale, but also use locale dependent terms. That won't be covered here, unfortunately.

I see... So the terms would have to be replaced with macros containing the same sort of conditionals. Not ideal. It may be more straightforward to use one of a the plugins for Zotero listed here: https://www.zotero.org/support/plugins#word_processor_and_writing_integration

(For instance https://github.com/egh/zotxt)

CLRafaelR commented 1 year ago

I'm trying to modify a csl file so that by applying @denismaier 's approach, I can display English references and non-English ones (Chinese or Taiwan Chinese zh-TW, here) in different layouts according to their language. However, the lang-** (lang-zh in the following example) does not seem to be recognised...

What am I missing in the following codes?

<citation>
   ...
    <layout>
      <choose>
        <if variable="lang-zh">
          <!--
             Multibyte comma as a deliminater for non-English references
          -->
          <group delimiter=",">
            <text macro="author-short-zh" />
            <text macro="issued-year-zh" />
            <text macro="citation-locator-zh" />
          </group>
        </if>
        <else>
          <!--
             Normal comma as a deliminater for references in the default language (English)
          -->
          <group delimiter=", ">
            <text macro="author-short" />
            <text macro="issued-year" />
            <text macro="citation-locator" />
          </group>
        </else>
      </choose>
    </layout>
  </citation>
Full `.csl` file (say `mod_apa_zh_pulipuli.csl`) The original csl called `apa_zh_pulipuli.csl` is from: https://raw.githubusercontent.com/pulipulichen/blogger/master/project/zotero/apa_zh_pulipuli.csl ```xml ```

tests.bib

@book{xie2015,
  title     = {Dynamic Documents with {R} and knitr},
  author    = {Yihui Xie},
  publisher = {Chapman and Hall/CRC},
  address   = {Boca Raton, Florida},
  year      = {2015},
  edition   = {2nd},
  note      = {ISBN 978-1498716963},
  url       = {http://yihui.name/knitr/},
  language  = {English}
}
@article{chen2012,
  title    = {基于电无级变速器的内燃机最优控制策略及整车能量管理},
  author   = {陈骁 and 黄声华 and 万山明 and 庞珽},
  journal  = {电工技术学报},
  volume   = {27},
  number   = {2},
  pages    = {133--138},
  year     = {2012},
  lang-zh  = {yes}
}

test.md

---
bibliography: [test.bib]
csl: mod_apa_zh_pulipuli.csl
---

@xie2015

@chen2012
badumont commented 1 year ago

I'm trying to modify [1]a csl file so that by applying @.* 's approach, I can display English references and non-English ones (Chinese or Taiwan Chinese zh-TW, here) in different layouts according to their language. However, the lang- (lang-zh in the following example) does not seem to be recognised...

I can confirm that it works with a CSL JSON file, but I don't know how to set custom variables in a .bib file.

CLRafaelR commented 1 year ago

@badumont Would you mind posting a screenshot of the output? Did you get an html file like this?:

image

badumont commented 1 year ago

@.*** Would you mind posting a screenshot of the output? Did you get an html file like this?:

No, sorry, I only created a dummy CSL file that outputs something is the variable lang-zh is used and something else if it is not. But as I said, the problem for you is how to set custom variables that are recognized by Pandoc in .bib files, and I can't help you on this point.

odomanov commented 1 year ago

I might be wrong, but it seems that there is a confusion here. The localization of terms (and the whole citations) is one thing, but the multiplicity of layouts is another. Even for a single layout, is it possible to have terms localized according to the language of citations? Could this be implemented with minimal changes?

The CSL-M styles that I checked looked like this:

  <citation>
    <layout suffix="." delimiter="; " locale="ru">
      <text macro="citation"/>
    </layout>
    <layout suffix="." delimiter="; " locale="uk">
      <text macro="citation"/>
    </layout>
    <layout suffix="." delimiter="; " locale="nl">
      <text macro="citation"/>
    </layout>
    <layout suffix="." delimiter="; " locale="fr">
      <text macro="citation"/>
    </layout>
    <layout suffix="." delimiter="; " locale="de">
      <text macro="citation"/>
    </layout>
    <layout suffix="." delimiter="; " locale="es">
      <text macro="citation"/>
    </layout>
    <layout suffix="." delimiter="; ">
      <text macro="citation"/>
    </layout>
  </citation>

So, layouts are basically the same, the only difference is their localization.

jgm commented 1 year ago

I'm not sure who you think is confused or about what -- could you make that clearer? The reason I didn't try to support this is detailed in https://github.com/jgm/citeproc/issues/120#issuecomment-1290864687

odomanov commented 1 year ago

Sorry, I'll try to explain. In general, my question was: could you implement term's localization without implementing multiple layouts?

I mean that by means of multiple layouts CSL-M realizes two things: (1) different localization of terms for different languages, (2) different layouts (order, positioning etc.) for different languages. But these are different (and independent) tasks. In most cases the latter is not necessary (as the example I provided demonstrates; Biblatex, as far as I understand, behaves similarly). Hence my question.

What I imagine is something like this (in pseudo-code):

[1] (locale="zh") 陈登原. 国史旧闻: 第 1 卷[M]. 北京: 中华书局, 2000. (/locale) [2] (locale="en") Peebles P. Z. Probability, random variables, and random signal principles. 4 ed. New York: McGraw-Hill, 2001. (/locale)

where within the (locale="..")...(/locale) form terms are localized accordingly.

Sorry, I hope this is a bit clearer now.

As for the disambiguation, grouping etc, I'm not sure I understand the problem. Unfortunately, I don't know the details. Does the problem arise with a single layout? I guess it doesn't.

jgm commented 1 year ago

Where would this locale information come from? CSL bibliographies have a language field, indicating the language of the item. But I don't think you'd want to localize your bibliography entry for each item depending on the language of the item. When I cite a German book in an English article, I'll still use English number and quote styles, and phrases like "editor." There may be cases, especially with East Asian languages, when you want to do something else, but it can't just be automatic based on the language field. CSL-M solves this, as I understand it, by letting the style decide how items in specific languages are to be formatted.

odomanov commented 1 year ago

Actually, this is what I'm saying. In some (many?) traditions when I cite a German source I use "Seite" instead of "page", "hrsg. von" instead of "ed. by" etc., like, for example, [Heidegger 1956, S.122]. Even quote styles are localized.

And the language comes from the item's language field, of course.

There might be an option (I can't say where; in a style?): to localize items according to (1) their language field, (2) the language of the main text (default language?). In this way users may choose the way of localization. Is this possible at all?.

jgm commented 1 year ago

Actually, this is what I'm saying. In some (many?) traditions when I cite a German source I use "Seite" instead of "page", "hrsg. von" instead of "ed. by" etc., like, for example, [Heidegger 1956, S.122]. Even quote styles are localized.

The problem is that this is style-dependent. Some styles would want this, others not. And some might want it only for specific languages or with specific limitations. That's why this requires style-level support (multiple layouts).

But yes: it would be possible to add the sort of feature you describe. I'm just not sure it's a good idea, but it would be good to hear from experts on bibliographic styles.