Closed njbart closed 6 years ago
wait -- CSL YAML and CSL JSON differ structurally? I thought they were the same except the delivery format.
Can you get me a source reference by way of a BBT error report?
:robot: this is your friendly neighborhood build bot announcing test build 5224 ("season ranges").
So what would the CSL representation of "Summer 2017 - May 2018" be? Is there documentation of the allowed formats?
CSL YAML supports both the year-season and the date-parts structure, no? I'd rather keep it all the same if possible.
Sample: Report ID HD5D2RMI
wait -- CSL YAML and CSL JSON differ structurally?
I’m afraid so. For clarity, the CSL YAML we’re dealing with should probably better referred to as “pandoc CSL YAML” – the in-field markup is different, and so are the date formats.
pandoc-citeproc seems to be able to parse most dates in the citeproc-js CSL JSON format when delivered as YAML, but has also introduced, I suppose for better readability, its own date format with year
, month
, date
elements.
For me personally, pandoc CSL YAML season ranges are of secondary importance (after all, pandoc-citeproc can be used to convert CSL JSON to pandoc CSL YAML), so you might as well choose not to implement this straight away – and hope for a speedy introduction of EDTF dates for CSL JSON and YAML, which would of course make this whole exercise unnecessary.
wait -- CSL YAML and CSL JSON differ structurally?
I’m afraid so. For clarity, the CSL YAML we’re dealing with should probably better referred to as “pandoc CSL YAML” – the in-field markup is different, and so are the date formats.
I knew about the in-field markup, but it was my understanding that CSL-YAML supported both date formats -- both the more readable format, and the admittedly less-than-great CSL-JSON format. If that is true this simplifies the exporters because I can (and currently do) use common infrastructure for both.
Just so I have things clear for the CSL-JSON case:
Summer 1969
parses to { "date-parts": [ [ 1969 ] ], "season": 2 }
Summer 1969 - Autumn 1969
parses to { "date-parts": [ [ 1969, 14 ], [ 1969, 15 ] ] }
Summer 1969 - December 1970
parses to { "date-parts": [ [ 1969, 14 ], [ 1970, 12 ] ] }
?Would the general rule just be "season point-dates get a season field, ranges (even if they have a season) always get only date-parts with a pseudo-month?'
BBT uses the numeric format for seasons as per https://github.com/citation-style-language/schema/blob/master/csl-data.json#L214
Should 2017-23/2017-24
have been recognised as a date range? edtf.js doesn't grok the /
.
So I'm going for 13 to 16 rather than 21 to 24 for wider compatibility.
(NM on 2017-23/2017-24
, I've re-enabled an exception path I had disabled to recognise ranges formatted this way)
Just so I have things clear for the CSL-JSON case:
Summer 1969
parses to{ "date-parts": [ [ 1969 ] ], "season": 2 }
Summer 1969 - Autumn 1969
parses to{ "date-parts": [ [ 1969, 14 ], [ 1969, 15 ] ] }
Summer 1969 - December 1970
parses to{ "date-parts": [ [ 1969, 14 ], [ 1970, 12 ] ] }
?
All yes. (It never occurred to me that someone might want to mix months and seasons in a date range, but there seems to be nothing in the 2016 ISO 8601 working draft that would forbid this, and it appears to work as expected with pandoc-citeproc, at least when using chicago-author-date.csl.)
To be clear about generic vs. pandoc CSL YAML: the only kind of date pandoc-citeproc cannot parse from generic CSL YAML is a season range with seasons represented as pseudo-months. If you feel you’d rather want to stick to generic CSL YAML (generic wrt dates) for BBT, we could also try to get pandoc-citeproc fixed to accept pseudo-months here.
No, if pandoc-csl-yaml expects season ranges as season fields, that can be done. AFAICT, pandoc is the only consumer of csl-yaml, might as well cater to its needs.
:robot: this is your friendly neighborhood build bot announcing test build 5228 ("test case for #860").
Just to be clear: the csl-yaml should look like this, correct?
---
id: SeasonrangesEDTF2017
issued:
- season: 3
year: 2017
- season: 4
year: 2017
original-date:
- season: 1
year: 2015
- season: 2
year: 2016
Correct.
ugh, this touches csl-yaml date handling more generally. What are the accepted date denotations for csl-yaml?
The output from pandoc-citeproc -y
suggests the following:
---
references:
- id: :2011
type: webpage
issued:
- year: '2011'
month: '11'
season: '1'
day: '30'
circa: '1'
- year: '2012'
month: '12'
season: '1'
day: '31'
circa: '1'
...
… so it’s more ISO8601/EDTF-like, with separate season and circa elements for start and end dates.
Strings? Not numbers? And season + month?
What do point dates look like? Just dicts directly under the key?
And will pandoc accept booleans for true rather than the string "1"?
:robot: this is your friendly neighborhood build bot announcing test build 5236 ("adjust test cases for #860").
Alright, give 5236 a spin for CSL YAML.
5236 looks very good.
Just to confirm:
pandoc-citeproc -y
itself outputs strings, but numbers are ok as wellseason
and month
(+day
) can co-occur (in the sense: don’t seem to cause trouble)
circa
are fine, too (though pandoc-citeproc -y
converts them to numbers)Minor wrinkle: 2017-14/2017-15
and even 2017-14/2017-99
get exported as month: 14
and month: 15
(resp., 99
). I would not encourage the use of strings in Zotero fields that superficially look like ISO8601/EDTF but aren’t, and would rather flag these as invalid and/or dump them into a literal
element. Besides, while months 13 to 16 (and 21 to 24) are valid (if workaround) date elements, any date with a pseudo-month not from this set cannot be considered a valid pandoc CSL YAML date.
:robot: this is your friendly neighborhood build bot announcing test build 5237 ("look out for invalid months").
pandoc-citeproc -y itself outputs strings, but numbers are ok as well
I don't understand why they'd make that choice though. These elements are always numbers.
season and month(+day) can co-occur (in the sense: don’t seem to cause trouble); I can’t imagine any input that BBT would export like this, though
Indeed it can't. I was just curious.
booleans for circa are fine, too (though pandoc-citeproc -y converts them to numbers)
And again, color me puzzled; given that both are supported, why use the most ambiguous format of the two?
Minor wrinkle: 2017-14/2017-15 and even 2017-14/2017-99 get exported as month: 14 and month: 15 (resp., 99). I would not encourage the use of strings in Zotero fields that superficially look like ISO8601/EDTF but aren’t, and would rather flag these as invalid and/or dump them into a literal element. Besides, while months 13 to 16 (and 21 to 24) are valid (if workaround) date elements, any date with a pseudo-month not from this set cannot be considered a valid pandoc CSL YAML date.
Those are now exported as literals in 5273.
Just a few more minor hitches:
0011-21/0012-22
, -0011-21/-0012-22
, and -2011-21/-2012-22
are all exported as literal, even though they are valid ISO8601/EDTF dates.
0000-12-12
and 0000-12-12/0000-12-13
work as expected, though.
:robot: this is your friendly neighborhood build bot announcing test build 5240 ("really, really nuts").
Yeah so date guessing is... special. Thank valhalla for my test set. Try 5240.
As to the differences you pointed out: 0000-12-12
and 0000-12-12/0000-12-13
are recognized and parsed by EDTF.js; 0011-21/0012-22
isn't EDTF so it goes through a heuristic stage before further parsing.
:robot: this is your friendly neighborhood build bot announcing test build 5241 ("circa").
But 0011-21/0012-22
is valid EDTF.
From http://www.loc.gov/standards/datetime/pre-submission.html: “5.2.5 Season The values 21, 22, 23, 24 may be used used to signify ' Spring', 'Summer', 'Autumn', 'Winter', respectively, in place of a month value (01 through 12) for a year-and-month format string.”
See also (ibid.): “5.1.1 Date […]Year MUST be four digits. (Years longer than four digits are covered in levels 1 and 2.) A year may be positive, negative, or year zero.”
Then perhaps it isn't according to the WD 2016-02-16 that EDTF.js implements ¯\_(ツ)_/¯. If I put that in EDTF.js it complains about the slash.
Then perhaps it isn't according to the WD 2016-02-16 that EDTF.js implements
Well, no. From https://www.loc.gov/standards/datetime/ISO_DIS%208601-2.pdf (ISO/DIS 8601-2:2016(e), 2016-10-26; this passage unchanged from the 2016-02-16 version except for section numbering):
“4.7 Divisions of a year For a year-and-month expression (e.g. 1984-04) the month component may take on values of 21 or above (in place of a month value, 01 through 12). These values signify a division of a year (e.g. “the season Spring”). 4.7.1 Level 1 The values 21, 22, 23, 24 may be used to signify ' Spring', 'Summer', 'Autumn', 'Winter', respectively. Format: YYYY-SS Example: · 2001-21 (Spring, 2001)”
So seasons should be treated just like months. Looks like an EDTF.js bug to me.
const edtf = require('edtf')
console.log(edtf('0011-21/0012-22'))
gives
Error: invalid syntax at line 1 col 8:
0011-21/0012-22
^
Unexpected "/"
for "0011-21/0012-22"
EDTF.js has no issue with the individual dates though. It has a problem with the range.
I’d maintain that since year-month ranges are valid EDTF, so are year-season ranges.
I'm not contesting that -- I'm just reporting what I see happening when EDTF.js is passed these dates. I can't find the part of the spec that talks about date ranges at all, so I don't know what the spec says about this.
I think it’s mainly in https://www.loc.gov/standards/datetime/ISO_DIS%208601-2.pdf: 4.4.4.1 Representations of time intervals identified by start and end, and 4.4.5 Representations other than complete. I’d agree it’s not easy to find, but I think it’s uncontroversial that YYYY-MM/YYYY-MM is valid, and if it is, then YYYY-SS/YYYY-SS must be valid, too.
Out of curiosity: does EDTF.js accept season ranges when using EDTF level 2?
I don't think so because without further config, EDTF.js defaults to the highest supported spec (IIRC).
Test build 5241 seems to export ISO 8601/EDTF season ranges to CSL alright – so it would seem you’re bypassing EDTF.js here, right?
Would you feel it’s still worthwhile to file a bug report with EDTF.js? If so, would you be willing to do that – you might be able to explain more clearly how exactly BBT is using EDTF.js?
One more glitch, too: year 0 (ISO) is year -1 (CSL), year -1 (ISO) is year -2 (CSL), and so on.
(CSL uses, quite awkwardly, -
as a shorthand for BCE
, and has no year zero.)
5241 exports 0000-01/0000-12
as "issued": { "date-parts": [[-1, 1], [-1, 12]] }
(correct), but maps 0000-21/0000-24
to "issued": { "date-parts": [[0, 13], [0, 16]] }
– this should be "issued": { "date-parts": [[-1, 13], [-1, 16]] }
(and analogously for all negative years).
Test build 5241 seems to export ISO 8601/EDTF season ranges to CSL alright – so it would seem you’re bypassing EDTF.js here, right?
Yep, BBT does a few heuristic stabs before and after it tries EDTF parsing; before are those patterns that confuse EDTF.js, after are those for when both the early heuristics and EDTF.js fail; the season ranges get caught by the after
phase.
Would you feel it’s still worthwhile to file a bug report with EDTF.js? If so, would you be willing to do that – you might be able to explain more clearly how exactly BBT is using EDTF.js?
Check: https://github.com/inukshuk/edtf.js/issues/12
5241 exports
0000-01/0000-12
as"issued": { "date-parts": [[-1, 1], [-1, 12]] }
(correct), but maps0000-21/0000-24
to"issued": { "date-parts": [[0, 13], [0, 16]] }
– this should be"issued": { "date-parts": [[-1, 13], [-1, 16]] }
(and analogously for all negative years).
Right, I think I've found the cause of that.
:robot: this is your friendly neighborhood build bot announcing test build 5252 ("year zero").
5252 fixes the seasons-BCE problem but also a bug with year zero in biblatex.
:robot: this is your friendly neighborhood build bot announcing test build 5253 ("dateparser").
Great. I haven’t been able to spot any further problems so far.
This thread has been automatically locked because it has not had recent activity. Please open a new issue for related bugs and link to relevant comments in this thread.
The CSL specs allow just one season element per date, so season ranges cannot be expressed in (specs-compliant) CSL JSON. The workaround used by citeproc-js, and adopted as of recently by pandoc-citeproc, is to use pseudo-months instead. citeproc-js has been using
13
to16
; pandoc-citeproc accepts these, and in addition the more ISO8601/EDTF-like21
to24
.It would be great if BBT could export season ranges entered in Zotero date fields, such as
Summer 2015 - Winter 2016
, or2015-22/2016-24
to CSL JSON as either, e.g.,or
… whichever version you prefer.
Of course this should be used for season ranges only – for point dates containing a season, BBT should continue using the
season
element.The correct way to export season ranges to biblatex is to use its native ISO8601/EDTF format, e.g.:
For pandoc’s CSL YAML format, the expected output is: