retorquere / zotero-better-bibtex

Make Zotero effective for us LaTeX holdouts
https://retorque.re/zotero-better-bibtex/
MIT License
5.42k stars 290 forks source link

Season ranges should be exported as pseudo-months (13-16, or 21-24) #860

Closed njbart closed 6 years ago

njbart commented 6 years ago

The CSL specs allow just one season element per date, so season ranges cannot be expressed in (specs-compliant) CSL JSON. The workaround used by citeproc-js, and adopted as of recently by pandoc-citeproc, is to use pseudo-months instead. citeproc-js has been using 13 to 16; pandoc-citeproc accepts these, and in addition the more ISO8601/EDTF-like 21 to 24.

It would be great if BBT could export season ranges entered in Zotero date fields, such as Summer 2015 - Winter 2016, or 2015-22/2016-24 to CSL JSON as either, e.g.,

"issued": { "date-parts": [[2015, 14], [2016, 16]] }

or

"issued": { "date-parts": [[2015, 22], [2016, 24]] }

… whichever version you prefer.

Of course this should be used for season ranges only – for point dates containing a season, BBT should continue using the season element.

The correct way to export season ranges to biblatex is to use its native ISO8601/EDTF format, e.g.:

date = {2015-22/2016-24}

For pandoc’s CSL YAML format, the expected output is:

  issued:
  - year: 2015
    season: 2
  - year: 2016
    season: 4
retorquere commented 6 years ago

wait -- CSL YAML and CSL JSON differ structurally? I thought they were the same except the delivery format.

retorquere commented 6 years ago

Can you get me a source reference by way of a BBT error report?

blip-bloop commented 6 years ago

:robot: this is your friendly neighborhood build bot announcing test build 5224 ("season ranges").

retorquere commented 6 years ago

So what would the CSL representation of "Summer 2017 - May 2018" be? Is there documentation of the allowed formats?

CSL YAML supports both the year-season and the date-parts structure, no? I'd rather keep it all the same if possible.

njbart commented 6 years ago

Sample: Report ID HD5D2RMI

wait -- CSL YAML and CSL JSON differ structurally?

I’m afraid so. For clarity, the CSL YAML we’re dealing with should probably better referred to as “pandoc CSL YAML” – the in-field markup is different, and so are the date formats.

pandoc-citeproc seems to be able to parse most dates in the citeproc-js CSL JSON format when delivered as YAML, but has also introduced, I suppose for better readability, its own date format with year, month, date elements.

For me personally, pandoc CSL YAML season ranges are of secondary importance (after all, pandoc-citeproc can be used to convert CSL JSON to pandoc CSL YAML), so you might as well choose not to implement this straight away – and hope for a speedy introduction of EDTF dates for CSL JSON and YAML, which would of course make this whole exercise unnecessary.

retorquere commented 6 years ago

wait -- CSL YAML and CSL JSON differ structurally?

I’m afraid so. For clarity, the CSL YAML we’re dealing with should probably better referred to as “pandoc CSL YAML” – the in-field markup is different, and so are the date formats.

I knew about the in-field markup, but it was my understanding that CSL-YAML supported both date formats -- both the more readable format, and the admittedly less-than-great CSL-JSON format. If that is true this simplifies the exporters because I can (and currently do) use common infrastructure for both.

Just so I have things clear for the CSL-JSON case:

Would the general rule just be "season point-dates get a season field, ranges (even if they have a season) always get only date-parts with a pseudo-month?'

retorquere commented 6 years ago

BBT uses the numeric format for seasons as per https://github.com/citation-style-language/schema/blob/master/csl-data.json#L214

retorquere commented 6 years ago

Should 2017-23/2017-24 have been recognised as a date range? edtf.js doesn't grok the /.

retorquere commented 6 years ago

So I'm going for 13 to 16 rather than 21 to 24 for wider compatibility.

retorquere commented 6 years ago

(NM on 2017-23/2017-24, I've re-enabled an exception path I had disabled to recognise ranges formatted this way)

njbart commented 6 years ago

Just so I have things clear for the CSL-JSON case:

Summer 1969 parses to { "date-parts": [ [ 1969 ] ], "season": 2 } Summer 1969 - Autumn 1969 parses to { "date-parts": [ [ 1969, 14 ], [ 1969, 15 ] ] } Summer 1969 - December 1970 parses to { "date-parts": [ [ 1969, 14 ], [ 1970, 12 ] ] }?

All yes. (It never occurred to me that someone might want to mix months and seasons in a date range, but there seems to be nothing in the 2016 ISO 8601 working draft that would forbid this, and it appears to work as expected with pandoc-citeproc, at least when using chicago-author-date.csl.)

To be clear about generic vs. pandoc CSL YAML: the only kind of date pandoc-citeproc cannot parse from generic CSL YAML is a season range with seasons represented as pseudo-months. If you feel you’d rather want to stick to generic CSL YAML (generic wrt dates) for BBT, we could also try to get pandoc-citeproc fixed to accept pseudo-months here.

retorquere commented 6 years ago

No, if pandoc-csl-yaml expects season ranges as season fields, that can be done. AFAICT, pandoc is the only consumer of csl-yaml, might as well cater to its needs.

blip-bloop commented 6 years ago

:robot: this is your friendly neighborhood build bot announcing test build 5228 ("test case for #860").

retorquere commented 6 years ago

Just to be clear: the csl-yaml should look like this, correct?

---
id: SeasonrangesEDTF2017
issued:
- season: 3
  year: 2017
- season: 4
  year: 2017
original-date:
- season: 1
  year: 2015
- season: 2
  year: 2016
njbart commented 6 years ago

Correct.

retorquere commented 6 years ago

ugh, this touches csl-yaml date handling more generally. What are the accepted date denotations for csl-yaml?

njbart commented 6 years ago

The output from pandoc-citeproc -y suggests the following:

---
references:
- id: :2011
  type: webpage
  issued:
  - year: '2011'
    month: '11'
    season: '1'
    day: '30'
    circa: '1'
  - year: '2012'
    month: '12'
    season: '1'
    day: '31'
    circa: '1'
...

… so it’s more ISO8601/EDTF-like, with separate season and circa elements for start and end dates.

retorquere commented 6 years ago

Strings? Not numbers? And season + month?

What do point dates look like? Just dicts directly under the key?

retorquere commented 6 years ago

And will pandoc accept booleans for true rather than the string "1"?

blip-bloop commented 6 years ago

:robot: this is your friendly neighborhood build bot announcing test build 5236 ("adjust test cases for #860").

retorquere commented 6 years ago

Alright, give 5236 a spin for CSL YAML.

njbart commented 6 years ago

5236 looks very good.

Just to confirm:

Minor wrinkle: 2017-14/2017-15 and even 2017-14/2017-99 get exported as month: 14 and month: 15 (resp., 99). I would not encourage the use of strings in Zotero fields that superficially look like ISO8601/EDTF but aren’t, and would rather flag these as invalid and/or dump them into a literal element. Besides, while months 13 to 16 (and 21 to 24) are valid (if workaround) date elements, any date with a pseudo-month not from this set cannot be considered a valid pandoc CSL YAML date.

blip-bloop commented 6 years ago

:robot: this is your friendly neighborhood build bot announcing test build 5237 ("look out for invalid months").

retorquere commented 6 years ago

pandoc-citeproc -y itself outputs strings, but numbers are ok as well

I don't understand why they'd make that choice though. These elements are always numbers.

season and month(+day) can co-occur (in the sense: don’t seem to cause trouble); I can’t imagine any input that BBT would export like this, though

Indeed it can't. I was just curious.

booleans for circa are fine, too (though pandoc-citeproc -y converts them to numbers)

And again, color me puzzled; given that both are supported, why use the most ambiguous format of the two?

Minor wrinkle: 2017-14/2017-15 and even 2017-14/2017-99 get exported as month: 14 and month: 15 (resp., 99). I would not encourage the use of strings in Zotero fields that superficially look like ISO8601/EDTF but aren’t, and would rather flag these as invalid and/or dump them into a literal element. Besides, while months 13 to 16 (and 21 to 24) are valid (if workaround) date elements, any date with a pseudo-month not from this set cannot be considered a valid pandoc CSL YAML date.

Those are now exported as literals in 5273.

njbart commented 6 years ago

Just a few more minor hitches:

0011-21/0012-22, -0011-21/-0012-22, and -2011-21/-2012-22 are all exported as literal, even though they are valid ISO8601/EDTF dates.

0000-12-12 and 0000-12-12/0000-12-13 work as expected, though.

blip-bloop commented 6 years ago

:robot: this is your friendly neighborhood build bot announcing test build 5240 ("really, really nuts").

retorquere commented 6 years ago

Yeah so date guessing is... special. Thank valhalla for my test set. Try 5240.

retorquere commented 6 years ago

As to the differences you pointed out: 0000-12-12 and 0000-12-12/0000-12-13 are recognized and parsed by EDTF.js; 0011-21/0012-22 isn't EDTF so it goes through a heuristic stage before further parsing.

blip-bloop commented 6 years ago

:robot: this is your friendly neighborhood build bot announcing test build 5241 ("circa").

njbart commented 6 years ago

But 0011-21/0012-22 is valid EDTF.

From http://www.loc.gov/standards/datetime/pre-submission.html: “5.2.5 Season The values 21, 22, 23, 24 may be used used to signify ' Spring', 'Summer', 'Autumn', 'Winter', respectively, in place of a month value (01 through 12) for a year-and-month format string.”

See also (ibid.): “5.1.1 Date […]Year MUST be four digits. (Years longer than four digits are covered in levels 1 and 2.) A year may be positive, negative, or year zero.”

retorquere commented 6 years ago

Then perhaps it isn't according to the WD 2016-02-16 that EDTF.js implements ¯\_(ツ)_/¯. If I put that in EDTF.js it complains about the slash.

njbart commented 6 years ago

Then perhaps it isn't according to the WD 2016-02-16 that EDTF.js implements

Well, no. From https://www.loc.gov/standards/datetime/ISO_DIS%208601-2.pdf (ISO/DIS 8601-2:2016(e), 2016-10-26; this passage unchanged from the 2016-02-16 version except for section numbering):

“4.7 Divisions of a year For a year-and-month expression (e.g. 1984-04) the month component may take on values of 21 or above (in place of a month value, 01 through 12). These values signify a division of a year (e.g. “the season Spring”). 4.7.1 Level 1 The values 21, 22, 23, 24 may be used to signify ' Spring', 'Summer', 'Autumn', 'Winter', respectively. Format: YYYY-SS Example: · 2001-21 (Spring, 2001)”

So seasons should be treated just like months. Looks like an EDTF.js bug to me.

retorquere commented 6 years ago
const edtf = require('edtf')
console.log(edtf('0011-21/0012-22'))

gives

Error: invalid syntax at line 1 col 8:

  0011-21/0012-22
         ^
Unexpected "/"
 for "0011-21/0012-22"
retorquere commented 6 years ago

EDTF.js has no issue with the individual dates though. It has a problem with the range.

njbart commented 6 years ago

I’d maintain that since year-month ranges are valid EDTF, so are year-season ranges.

retorquere commented 6 years ago

I'm not contesting that -- I'm just reporting what I see happening when EDTF.js is passed these dates. I can't find the part of the spec that talks about date ranges at all, so I don't know what the spec says about this.

njbart commented 6 years ago

I think it’s mainly in https://www.loc.gov/standards/datetime/ISO_DIS%208601-2.pdf: 4.4.4.1 Representations of time intervals identified by start and end, and 4.4.5 Representations other than complete. I’d agree it’s not easy to find, but I think it’s uncontroversial that YYYY-MM/YYYY-MM is valid, and if it is, then YYYY-SS/YYYY-SS must be valid, too.

Out of curiosity: does EDTF.js accept season ranges when using EDTF level 2?

retorquere commented 6 years ago

I don't think so because without further config, EDTF.js defaults to the highest supported spec (IIRC).

njbart commented 6 years ago

Test build 5241 seems to export ISO 8601/EDTF season ranges to CSL alright – so it would seem you’re bypassing EDTF.js here, right?

Would you feel it’s still worthwhile to file a bug report with EDTF.js? If so, would you be willing to do that – you might be able to explain more clearly how exactly BBT is using EDTF.js?

One more glitch, too: year 0 (ISO) is year -1 (CSL), year -1 (ISO) is year -2 (CSL), and so on.

(CSL uses, quite awkwardly, - as a shorthand for BCE, and has no year zero.)

5241 exports 0000-01/0000-12 as "issued": { "date-parts": [[-1, 1], [-1, 12]] } (correct), but maps 0000-21/0000-24 to "issued": { "date-parts": [[0, 13], [0, 16]] } – this should be "issued": { "date-parts": [[-1, 13], [-1, 16]] } (and analogously for all negative years).

retorquere commented 6 years ago

Test build 5241 seems to export ISO 8601/EDTF season ranges to CSL alright – so it would seem you’re bypassing EDTF.js here, right?

Yep, BBT does a few heuristic stabs before and after it tries EDTF parsing; before are those patterns that confuse EDTF.js, after are those for when both the early heuristics and EDTF.js fail; the season ranges get caught by the after phase.

Would you feel it’s still worthwhile to file a bug report with EDTF.js? If so, would you be willing to do that – you might be able to explain more clearly how exactly BBT is using EDTF.js?

Check: https://github.com/inukshuk/edtf.js/issues/12

5241 exports 0000-01/0000-12 as "issued": { "date-parts": [[-1, 1], [-1, 12]] } (correct), but maps 0000-21/0000-24 to "issued": { "date-parts": [[0, 13], [0, 16]] } – this should be "issued": { "date-parts": [[-1, 13], [-1, 16]] } (and analogously for all negative years).

Right, I think I've found the cause of that.

blip-bloop commented 6 years ago

:robot: this is your friendly neighborhood build bot announcing test build 5252 ("year zero").

retorquere commented 6 years ago

5252 fixes the seasons-BCE problem but also a bug with year zero in biblatex.

blip-bloop commented 6 years ago

:robot: this is your friendly neighborhood build bot announcing test build 5253 ("dateparser").

njbart commented 6 years ago

Great. I haven’t been able to spot any further problems so far.

lock[bot] commented 6 years ago

This thread has been automatically locked because it has not had recent activity. Please open a new issue for related bugs and link to relevant comments in this thread.