jgm / pandoc-citeproc

Library and executable for using citeproc with pandoc
BSD 3-Clause "New" or "Revised" License
291 stars 61 forks source link

citing a bibliographic entry that has two years (e.g., Laplace 1814/1995) #103

Closed scaramouche1 closed 6 years ago

scaramouche1 commented 9 years ago

When citing an old book it is customary to display two years: the year the book was originally written and the year of the actual re-edition being cited.

For instance, the following bibtex entry

@BOOK{Lapl1814,
  title =        {Philosophical Essay on Probabilities},
  publisher =    {Springer-Verlag},
  year =         {1814/1995},
  author =       {Laplace, P. S.},
  address =      {New York}
}

would be cited as "Laplace (1814/1995)." LaTeX (both Natbib and Biblatex) deal with this convention. Pandoc currently cites this as "Laplace (1995)." Is there a simple way to solve this?

jgm commented 9 years ago

@nickbart, do you know if there's a way to do this with CSL?

njbart commented 9 years ago

Yes, there is a way to render an “original” date along with a “date” with CSL – or rather two: one I cannot recommend, though it would work (well, sort of, but only for the particular format the OP asked for) once a small pandoc-citeproc bug were fixed, and one I can recommend (using the biblatex origdate field), though this requires having most existing CSL style files fixed, i.e., all those that cannot handle the CSL variable original-date yet.

This bibfile, as in the OP, named issue103.bib:

@BOOK{Lapl1814,
  title =        {Philosophical Essay on Probabilities},
  publisher =    {Springer-Verlag},
  year =         {1814/1995},
  author =       {Laplace, P. S.},
  address =      {New York}
}

… converted to CSL YAML:

$ pandoc-citeproc -y issue103.bib 
---
references:
- publisher-place: New York
  author:
  - family: Laplace
    given: P. S.
  id: Lapl1814
  issued:
    literal: 1814/1995
  title: Philosophical essay on probabilities
  type: book
  publisher: Springer-Verlag
...

… contains literal: 1814/1995, which is ok since issue103.bib contains a year rather than a date field, and everything in a bibtex/biblatex year field that is not a number, i.e., a string consisting of digits only, should be mapped to CSL issued: literal: So far, so good.

Now, pandoc-citeproc seems to remove one part of the literal string (1814/) along the way:

$ echo @Lapl1814 | pandoc -F pandoc-citeproc --bibliography issue103.bib -t markdown-citations
Laplace (1995)

<div class="references">

Laplace, P. S. 1995. *Philosophical Essay on Probabilities*. New York:
Springer-Verlag.

</div>

… though I don’t see why anything in a literal: element shouldn’t be rendered strictly as-is, e.g.:

Laplace, P. S. 1814/1995. *Philosophical Essay on Probabilities*. New York:
Springer-Verlag.

… so this seems to be a pandoc-citeproc bug.

Interestingly, other strings, e.g., literal: 13th century (from the example in http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#dates) is rendered correctly by pandoc-citeproc.

Note that this does not (and should not!) work this way with the biblatex date field. From the biblatex manual:

The date fields date, origdate, eventdate, and urldate require a date specification in yyyy-mm-dd format. Date ranges are given as yyyy-mm-dd/ yyyy-mm-dd. Partial dates are valid provided that date components are omitted at the end only. You may specify an open ended date range by giving the range separator and omitting the end date (e. g., yyyy/).

Hence, pandoc-citeproc -y converts biblatex date = {1814/1995}, to CSL YAML

  issued:
    date-parts:
    - - 1814
    - - 1995

… which is absolutely correct.

This means one can use the biblatex year field for a few things that are not possible when using the date field which is limited to the constructs listed in the quote above.

The biblatex-chicago manual notices this as well:

[Sometimes] the month and year fields may be more convenient. The latter may be particularly useful in some entries because it can hold more than just numerical data, in contrast to date itself.

Still, while it should be possible to use any string in a bibtex/biblatex year field when this pandoc-citeproc bug is fixed, I cannot recommend this since different style guides call for different formatting of “original date” and “date” combinations:

The Chicago Manual, e.g., wants "(Darwin [1859] 1964)" in-text, and “Darwin, Charles. (1859) 1964. On the Origin of Species. […]” in the reference list.

Hence, the recommended way of doing this is to use the biblatex origdate field:

@BOOK{Lapl1814,
  title =        {Philosophical Essay on Probabilities},
  publisher =    {Springer-Verlag},
  date =         {1995},
  origdate =     {1814},
  author =       {Laplace, P. S.},
  address =      {New York}
}

where pandoc-citeproc correctly maps biblatex origdate to CSL original-date.

Unfortunately, most CSL style files still cannot deal with the CSL variable original-date (though it's been an official CSL variable for a long time, and can actually be used by Zotero using the hack described in http://forums.zotero.org/discussion/3673/2/original-date-of-publication/, p. 2).

@scaramouche1: I would recommend you to use biblatex origdate, check whether the CSL style file(s) you would like to use already support using the CSL original-date variable, and if they do not, report this at http://forums.zotero.org/11/ as bugs in these style files and ask politely but firmly for this to be fixed.

@jgm: As per above, it would be good if pandoc-citeproc rendered anything in a CSL literal: element strictly as-is. In addition, I’d like to repeat this request: To allow pandoc users to access CSL "original-date" (and other CSL variables for which no Zotero fields exist yet) when using Zotero and zotxt, it would be nice if pandoc-citeproc could extract CSL variables embedded in the CSL note variable, as requested in http://github.com/jgm/pandoc-citeproc/issues/94.

njbart commented 9 years ago

Ok, I think I see what seems to be behind this:

cat > test.json << EOT
[
    {
        "id": "item1",
        "type": "article-magazine",
        "title": "Test",
        "author": [
            {
                "family": "Author",
                "given": "Al"
            }
        ],
        "issued": {
            "literal": "03/10/2001-12/19/2002"
        }
    }
]
EOT

pandoc-citeproc -y test.json

echo @item1 | pandoc -F pandoc-citeproc --biblio test.json -t plain

Output:

Author (2001–2002)

Author, Al. 2001–2002. “Test,” March 10–December 19.

So the parser seems to look for MM/DD/YYYY-MM/DD/YYYY (and MM/DD/YYYY) strings in literal elements.

Now, I would say this should be removed:

Strictly speaking, the literal element should never be modified at all.

Despite that, currently, but only in .json sources, we parse the ad hoc constructs 1999_2001 and 1999_ as date ranges, to work around limitations of Zotero. That's ok.

We might possibly also want to parse, again only from .json, certain ISO dates and date ranges or additional ad hoc constructs (like YYYY_MM_DD_YYYY_MM_DD), again to work around limitations of Zotero (see https://github.com/jgm/pandoc-citeproc/issues/86).

But parsing MM/DD/YYYY-MM/DD/YYYY (and MM/DD/YYYY) is mostly unnecessary (since there are better options for entering these dates), doesn't help with the Zotero issues, and badly mangles strings like 1814/1995 from the OP, as well as 2015-09-18(which is rendered as 2015–18AD), and possibly many others, so I’m in favour of removing it.

njbart commented 9 years ago

So, here are my concrete proposals:

jgm commented 6 years ago

@njbart Are these proposals (written two years ago) still valid? or have things changed with zotero/CSL in ways that would lead you to change them?

njbart commented 6 years ago

Ok, let’s see:

What I’m using most these days is a Zotero → zotxt → pandoc(-citeproc) workflow.

Now, Zotero has some long-standing deficiencies, in particular WRT handling dates and providing GUI fields for certain variables (e.g., DOI for books & book sections). This is why citeproc-js introduced the so-called cheater syntax, to be entered in Zotero’s Extra field (which is exported to the CSL note variable).

However, the Zotero addon Better BibTeX (BBT) is now capable of parsing this cheater syntax, and exporting “non-cheater” CSL JSON and CSL YAML.

What’s more, if BBT is installed, zotxt now uses BBT’s CSL JSON exporter, and thus feeds “non-cheater” CSL JSON to pandoc-citeproc. (This behaviour seems undocumented, but is evident from comparing the output of, e.g. curl http://127.0.0.1:23119/zotxt/items?easykey=author:2017title with and without BBT being installed.)

So, from the pragmatic point of view of a zotxt user, I’d say, let BBT deal with Zotero’s deficiencies, and pandoc-citeproc could very well just ignore the “cheater syntax” altogether: Anyone wishing to use Zotero with the “cheater syntax” would have to install BBT.

This picture would change, however, if it should turn out that there are other biblio databases providing CSL JSON with “cheater syntax”; or if you decide you want to emulate citeproc-js as closely as possible, including “cheater syntax”. If you want to do this, see my suggestions at https://github.com/jgm/pandoc-citeproc/issues/192.

Action items, apart from that:

Others, related to ISO8601/EDTF:

jgm commented 6 years ago

I’d definitely get rid of the undocumented non-standard date parser (code at the end of Date.hs) – this was causing the OP’s problems.

I tried this, but several tests from the citeproc test suite that had passed before failed after the change. It seems that these tests assume that 'raw' fields will be parsed in some way. Not sure if pandoc-citeproc's way of parsing this is the right one -- is this documented anywhere?

I've changed things, though, so that 'literal' fields are left alone.

jgm commented 6 years ago

Also, the parsing of 'raw' is currently only triggered when date-parts is absent or empty. However, there are test cases that assume that raw will override date-parts:

% cat citeproc-test/processor-tests/machines/date_RawSeasonRange1.json
{
    "abbreviations": false, 
    "bibentries": false, 
    "bibsection": false, 
    "citation_items": false, 
    "citations": false, 
    "csl": "<style \n      xmlns=\"http://purl.org/net/xbiblio/csl\"\n      class=\"note\"\n      version=\"1.0\">\n  <info>\n    <id />\n    <title />\n    <updated>2009-08-10T04:49:00+09:00</updated>\n  </info>\n  <citation>\n    <layout>\n\t  <date variable=\"issued\" date-parts=\"year-month-day\" form=\"text\"/>\n    </layout>\n  </citation>\n</style>", 
    "input": [
        {
            "id": "ITEM-1", 
            "issued": {
                "date-parts": [
                    [
                        "1965", 
                        "6", 
                        "1"
                    ]
                ], 
                "raw": "Spring 1999 - Summer 2001"
            }, 
            "type": "book"
        }
    ], 
    "mode": "citation", 
    "result": "Spring 1999–Summer 2001"
}

I'm pretty confused about what we're supposed to be doing here, maybe @fbennett can shed some light?

jgm commented 6 years ago

I've just changed this so that raw takes precedence over date-parts when both are present, as suggested by the test suite.

fbennett commented 6 years ago

John: Sorry that dates are such a mess. I can't promise it quickly, but I should work up a descriptive manual for a start, to at least get what the processor does on the table.

Meanwhile, I promise not to complicate things further!

njbart commented 6 years ago

@fbennett – How does citeproc-js expect start and end seasons to be represented in CSL JSON? An array inside the season element, or something else? (The year/season range issued: Spring 1999 - Summer 2001 from the test case above, entered in Zotero’s Extra field, and processed by citeproc-js/LO is rendered flawlessly [clearest evidence when language set to French: “printemps 1999–été 2001”] – but the CSL and citeproc-js specs are silent on year/season ranges, and pandoc-citeproc doesn’t seem to handle these [yet].)

Not sure about the circa element either: can this be represented separately for a start and/or an end date?

njbart commented 6 years ago

@fbennett – I found a reference to season ranges here – so it seems citeproc-js uses the pseudo months 13 to 16 to represent seasons.

My question: What would be your recommendation concerning the format for season ranges to be used by CSL JSON exporters and the various citeprocs? (Specifically, I’d like to be able to suggest a suitable format for both BBT’s “Better CSL JSON” export format, and pandoc-citeproc.)

[
  {
    "id": "item1",
    "issued": "2017-12-22",
    "accessed": "2017-24",
    "original-date": "1888-09-18~/1889-10-19~",
    "type": "webpage"
  }
]

@jgm – The OP’s problem seems to be fixed now.

As to season ranges, pandoc-citeproc can parse and format these correctly when a biblatex or CSL YAML source is used, e.g.,

date = {1814-24/1817-23},

or

  issued:
  - year: '1814'
    season: '4'
  - year: '1817'
    season: '3'

but not when CSL JSON

"issued":{"raw":"2013-13/2013-14"}}

or

    "issued": {
      "date-parts": [
        [
          2013,
          13
        ],
        [
          2013,
          14
        ]
      ]
    },

are used.

When formatted with pandoc, this results in dates such as “2013–13 2013, 14AD” and “13–14 2013”, respectively. (Similar results with pseudo months 21 and 22.)

It seems pandoc-citeproc does not try to parse a string in raw as an ISO date first (which I would argue it should). Instead, it seems to check for a MM/DD/YYYY-MM/DD/YYYY format (citeproc-js recognises this format too, but seems to give precedence to ISO).

Also, it seems that pandoc-citeproc does not carry out any checks whether a month is in the range 1 to 12. I’d suggest that it should do so – in addition it should accept 13 to 16, and also 21 to 24 as pseudo months representing seasons, and reject everything else. (In the case of seasons, any days that might have been entered by mistake should probably be rejected, too.)

The same goes for days, too BTW. pandoc-citeproc should at least reject any days > 31.

njbart commented 6 years ago

In addition, pandoc-citeproc does not seem to accept literal seasons in date-parts, as in:

[
  {
    "id": "item1",
    "issued": {
      "date-parts": [
        [
          2017
        ]
      ],
      "season": "Trinity"
    },
    "type": "webpage"
  }
]
jgm commented 6 years ago

In addition, pandoc-citeproc does not seem to accept literal seasons in date-parts, as in:

Not sure what you mean, since here the season is outside of date-parts, and pandoc does understand this.

% pandoc-citeproc -f json -y
[
  {
    "id": "item1",
    "issued": {
      "date-parts": [
        [
          2017
        ]
      ],
      "season": "Trinity"
    },
    "type": "webpage"
  }
]
^D
---
references:
- id: item1
  type: webpage
  issued:
  - year: '2017'
    season: Trinity
...
jgm commented 6 years ago

With commit e65ab91f476a0c978c7d533da0b81891006a157a we now parse

[
  {
    "id": "item1",
    "issued": "2017-12-22",
    "accessed": "2017-24",
    "original-date": "1888-09-18~/1889-10-19~",
    "type": "webpage"
  }
]

as

---
references:
- id: item1
  type: webpage
  issued:
  - year: '2017'
    month: '12'
    day: '22'
  accessed:
  - year: '2017'
  - year: '24'
  original-date:
  - year: '1888'
    month: '9'
    day: '18'
    circa: '1'
  - year: '1889'
    month: '10'
    day: '19'
    circa: '1'
...
njbart commented 6 years ago

Re: Literal strings in the season element: You’re right, I might have seen season as a date part in some wider sense, but of course it’s not inside the date-parts array itself. And, again, this might have been a case where I still had a previous version of the pandoc-citeproc executable in my path; the latest version does indeed work as expected.

njbart commented 6 years ago

With commit e65ab91 we now parse […]

Actually, I did not expect the "issued": "2017-12-22" format to be implemented straight away.

So far, this was merely something I was putting up for discussion. (As far as I can tell, it is neither endorsed by the citeproc-js specs nor does it appear in its test cases.)

My immediate motivation was to figure out how season ranges could be represented in CSL JSON – and one obvious solution is of course to use ISO8601/EDTF dates.

We could of course put ISO8601/EDTF dates in the raw element, but it would have to be guaranteed that processors parsing the raw element always try ISO8601/EDTF first.

This, plus a number of other reasons make me think that a dedicated ISO element would provide a much better solution – whether that’s "issued": "2017-12-22" (with the stipulation that only ISO is allowed here), or "issued": { "iso": "2017-12-22" }, or something similar. Of course, this will need to be discussed with @fbennett and others.

(pandoc’s CSL YAML, being less constrained by interoperability issues than CSL JSON, could of course introduce one-line ISO dates as its default format for dates any time, if you happen to be interested …)

What I found out in the meantime, however, is that citeproc-js accepts season ranges when expressed as date=parts / month 13 to 16.

Until the ISO issues are sorted out, I think using pseudo months for season ranges will be the most robust solution – better than an ISO date in a raw element that may or may not be actually parsed as ISO.

So it’d be great if pandoc-citeproc could start parsing the pseudo months 13 to 16 as seasons, and possibly add 21 to 24 (ISO-style) as well.

njbart commented 6 years ago

More on a literal string (“Trinity”) in a CSL JSON season element: This gets converted (see above) but not rendered (while a numerical season, e.g., 3 for summer, does).

cat > test.json << EOT
[
  {
    "id": "item1",
    "issued": {
      "date-parts": [
        [
          2017
        ]
      ],
      "season": "Trinity"
    },
    "type": "webpage"
  },
  {
    "id": "item2",
    "issued": {
      "date-parts": [
        [
          2017
        ]
      ],
      "season": "2"
    },
    "type": "webpage"
  }
]
EOT

pandoc-citeproc -y test.json

echo "@item1; @item2" | pandoc -F pandoc-citeproc --biblio test.json -t plain

Actual output:

2017a. 2017.

2017b. Summer 2017.

Expected:

2017a. Trinity 2017.

2017b. Summer 2017.
jgm commented 6 years ago

I think we already do parse the pseudo-months as seasons! At least, I recall adding that in the last revision.

njbart commented 6 years ago

Well, this is what the most recent dev version gives me:

cat > test.json << EOT
[
  {
    "id": "item1",
    "issued": { "date-parts": [[2013, 13], [2013, 14]] },
    "type": "webpage"
  },
  {
    "id": "item2",
    "issued": { "date-parts": [[2013, 21], [2013, 22]] },
    "type": "webpage"
  }
]
EOT

pandoc-citeproc -y test.json

echo "@item1; @item2" | pandoc -F pandoc-citeproc --biblio test.json -t plain

Actual output:

---
references:
- id: item1
  type: webpage
  issued:
  - year: '2013'
    month: '13'
  - year: '2013'
    month: '14'

- id: item2
  type: webpage
  issued:
  - year: '2013'
    month: '21'
  - year: '2013'
    month: '22'
...
(2013a); (2013b)

2013a. 13–14 2013.

2013b. 21–22 2013.

Expected:

---
references:
- id: item1
  type: webpage
  issued:
  - year: '2013'
    season: '1'
  - year: '2013'
    season: '2'

- id: item2
  type: webpage
  issued:
  - year: '2013'
    season: '1'
  - year: '2013'
    season: '2'
...
(2013a); (2013b)

2013a. Spring–Summer 2013.

2013b. Spring–Summer 2013.
jgm commented 6 years ago

OK, I think I've got the pseudo-months working properly now.

njbart commented 6 years ago

Great. CSL JSON to formatted output seems to work nicely. Just a few more wrinkles:

pandoc CSL YAML season elements are not mapped to CSL JSON pseudo-months when converting between the formats:

cat > test.yaml << EOT
---
references:
- id: :2011
  type: webpage
  issued:
  - year: '2011'
    season: '1'
  - year: '2012'
    season: '2'
...
EOT

pandoc-citeproc -j test.yaml

Actual output:

[
  {
    "id": ":2011",
    "issued": {
      "date-parts": [
        [
          2011
        ],
        [
          2012
        ]
      ],
      "season": "2"
    },
    "type": "webpage"
  }
]

Expected:

[
  {
    "id": ":2011",
    "issued": {
      "date-parts": [
        [
          2011,
          13
        ],
        [
          2012,
          14
        ]
      ]
    },
    "type": "webpage"
  }
]

Also, pseudo-months aren’t preserved when converting CSL JSON to CSL JSON:

cat > test.json << EOT
[
  {
    "id": ":2011",
    "issued": {
      "date-parts": [
        [
          2011,
          13
        ],
        [
          2012,
          14
        ]
      ]
    },
    "type": "webpage"
  }
]
EOT

pandoc-citeproc -j test.json

Actual output:

[
  {
    "id": ":2011",
    "issued": {
      "date-parts": [
        [
          2011
        ],
        [
          2012
        ]
      ],
      "season": "2"
    },
    "type": "webpage"
  }
]

Expected:

[
  {
    "id": ":2011",
    "issued": {
      "date-parts": [
        [
          2011,
          13
        ],
        [
          2012,
          14
        ]
      ]
    },
    "type": "webpage"
  }
]
palinurus commented 6 years ago

Hi, I'm having a problem that may be related to this discussion. Apologies if not, and thanks for any help you can offer. A few items in my bibliography (which I maintain in bibtex form, if this matters) have publication year ranges (1993--94, e.g.). When I first started using pandoc a few years ago, I found that these were displayed correctly upon PDF conversion if I included full four-digit years: year = {1993--1994}.

In any case, I recently upgraded to the latest versions of pandoc (and xelatex). The one resulting problem I haven't been able to fix is that these year-range items no longer process correctly. I've tried double brackets (the equivalent of the "literal: " specification mentioned above, I think?), single instead of double dashes, the ad hoc underscores mentioned above, using a date field instead of a year field, etc., but nothing seems to allow me to include the year range I need. Is there a workaround for this, or what am I overlooking? Thanks!

jgm commented 6 years ago

@palinurus: What you want is

date = {1993/1994}
palinurus commented 6 years ago

Great. Thanks so much!

On Thu, May 17, 2018 at 11:13 PM, John MacFarlane notifications@github.com wrote:

@palinurus: What you want is

date = {1993/1994}

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jgm/pandoc-citeproc/issues/103#issuecomment-390105187, or mute the thread https://github.com/notifications/unsubscribe-auth/ALN0eNBSPTxQrD17BXbDs5n3MMjdc7q3ks5tzmaBgaJpZM4DZMHT .