jgm / pandoc-citeproc

Library and executable for using citeproc with pandoc
BSD 3-Clause "New" or "Revised" License
291 stars 61 forks source link

Parse more CSL vars embedded in CSL JSON note field #192

Closed njbart closed 6 years ago

njbart commented 8 years ago

It emerges that citeproc-js parses (some) CSL date vars embedded in CSL JSON note fields into their components:

citeproc-js also parses CSL creator vars embedded in CSL JSON note fields:

It would be great if pandoc-citeproc could support this, too.

njbart commented 8 years ago

Just pinging: Since in Zotero there has been no progress on date ranges at all for many years, it would be great if pandoc-citeproc could parse ISO date ranges of the following forms out of the “note” variable:

{:issued:2015-07-27/2016-03-30}, or {:issued:2015-07/2016-03}, or {:issued:2015/2016}

The other date variables (accessed, container, event-date, original-date, submitted) would of course best be supported, too.

For consistency, pandoc-citeproc should probably parse any date, not just date ranges here. For a simple calendar date, the Zotero Date field is of course sufficient, but for circa dates, seasons etc. in EDTF format it would again be helpful if pandoc-citeproc could parse these.

pandoc-citeproc should continue to overwrite existing variables: For date ranges in particular this would allow having one date (range) in Zotero’s date field for display and sorting purposes (e.g., 2015-07-27/2016-03-30 can be entered in a Zotero Date field, but it is always the start date only that is parsed and exported) and the same date range in the note variable that can be parsed correctly by pandoc-citeproc.

Any content in one of the date variables that cannot be identified as an ISO or (better) EDTF date or date range should of course be treated as a “literal” date.

njbart commented 8 years ago

To clarify further, here’s an example showing the actual and the expected output from pandoc-citeproc -y csl-json-cheater-syntax-1.json.

The CSL JSON format for dates – including the so-called “cheater syntax” that embeds variables inside the note variable – has recently been updated and is documented in https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html#cheater-syntax-for-odd-fields.

csl-json-cheater-syntax-1.json contains:

[
    {
        "id": "cheater-syntax-braced-entry-1",
        "note": "Foo {:issued: 2015-06-30} bar"
    },
    {
        "id": "cheater-syntax-braced-entry-2",
        "note": "Foo {:issued:2015-06-30/2016-07-31} bar"
    },
    {
        "id": "cheater-syntax-line-entry-1",
        "note": "Foo\nissued: 2015-06-30\nbar"
    },
    {
        "id": "cheater-syntax-line-entry-2",
        "note": "Foo\nissued:2015-06-30/2016-07-31\nbar"
    }
]

actual output:

---
references:
- id: cheater-syntax-braced-entry-1
  type: no-type
  note: 'Foo {:issued: 2015-06-30} bar'

- id: cheater-syntax-braced-entry-2
  type: no-type
  note: Foo {:issued:2015-06-30/2016-07-31} bar

- id: cheater-syntax-line-entry-1
  type: no-type
  note: 'Foo issued: 2015-06-30 bar'

- id: cheater-syntax-line-entry-2
  type: no-type
  note: Foo issued:2015-06-30/2016-07-31 bar
...

expected output:

---
references:
- id: cheater-syntax-braced-entry-1
  type: no-type
  note: 'Foo bar'
  issued:
  - year: '2015'
    month: '6'
    day: '30'

- id: cheater-syntax-braced-entry-2
  type: no-type
  note: Foo bar
  issued:
  - year: '2015'
    month: '6'
    day: '30'
  - year: '2016'
    month: '7'
    day: '31'

- id: cheater-syntax-line-entry-1
  type: no-type
  note: 'Foo bar'
  issued:
  - year: '2015'
    month: '6'
    day: '30'

- id: cheater-syntax-line-entry-2
  type: no-type
  note: Foo bar
  issued:
  - year: '2015'
    month: '6'
    day: '30'
  - year: '2016'
    month: '7'
    day: '31'
...

Again, it’s support for cheater-syntax dates that’s particularly high on my wish list, since there’s currently no other way to get Zotero to export date ranges to CSL JSON via zotxt, for onward processing by pandoc.

Hence:

njbart commented 6 years ago

It seems most of the issues described in this thread have been fixed in the meantime – much appreciated. EDIT: Wrong – it seems pandoc-citeproc wasn’t updated after all; the effect seen is zotxt now using BBT’s cheater syntax parser if BBT is installed.

The sole exception is pandoc-citeproc -y / -j, whose output is still problematic, in particular examples such as note: 'Foo issued: 2015-06-30 bar', which, after the removal of the \n, cannot even be parsed correctly if fed to pandoc-citeproc again. So, at least the newlines should be kept, but, even better, the “cheater” variables should be removed from the note variable and written out as regular variables.