plk / biblatex

biblatex is a sophisticated bibliography system for LaTeX users. It has considerably more features than traditional bibtex and supports UTF-8
506 stars 116 forks source link

Circa dates, circa date ranges, and question marked dates. Plus Eras. #427

Closed JohnLukeBentley closed 7 years ago

JohnLukeBentley commented 8 years ago

I request support for:

It would be desirable for these kinds of values to be permissible in all date fields. I use origdate as the more likely example. There might be reasons for choosing different symbols for these kind of dates, to make parsing easier. There might be reasons for disallowing spaces.

edit: This issue thread also combines issue Before the common era (BCE/BC) and common era (CE/AD) date support. #422 /edit

Kinds

When writing a scholarly piece there are several kinds of date ambiguities and uncertainties, these are listed below.

(Part of the power of Biblatex is in providing support for many and any type of style guide. But in the cases below I sometimes borrow from the ...

University of Chicago. 2010. The Chicago Manual of Style. 16th ed. Chicago: University of Chicago. http://www.chicagomanualofstyle.org/16/contents.html.

... because it sheds some light on these issues. I could have chosen another style guide.)

  1. Circa dates. Where the scholarship is only able to fix a date approximately. Generally this is beyond the precision of a year, as in "c. 125 CE" (rather than at the precision of a day: we rarely see, in publishing, something like "c. 0125-02-20").

    ca. or c.

    circa, about, approximately (ca. preferred for greater clarity)

    (University of Chicago 2010, under "10.43 Scholarly abbreviations", http://www.chicagomanualofstyle.org/16/ch10/ch10_sec043.html)

    // These are my examples (derived but not quoted from (University of Chicago 2010))
    (Epictetus ca. 125 CE)
    (Epictetus c. 125 CE)
  2. Circa date ranges.

    Citation: (da Vinci c. 1487–1490)
    
    Reference Entry: 
    Da Vinci, Leonardo. c. 1487–1490. Codex Trivulzianus.
  3. No dates.

    "When the publication date of a printed work cannot be ascertained, the abbreviation n.d."

    Boston, n.d.

    (University of Chicago 2010, under "14.152 'No date'", http://www.chicagomanualofstyle.org/16/ch14/ch14_sec152.html)

  4. Question marked dates.

    A guessed-at date may either be substituted (in brackets) or added. Edinburgh, [1750?] or Edinburgh, n.d., ca. 1750

    (University of Chicago 2010, under "14.152 'No date'", http://www.chicagomanualofstyle.org/16/ch14/ch14_sec152.html)

  5. Ambiguities over which known date is relevant. For example we might have the following reference entry:

    Hume, David. 1751. “An Enquiry Concerning the Principles of Morals.” In Enquiries Concerning Human Understanding and Concerning the Principles of Morals, 3rd ed., edited by L. A. Selby-Bigge and P. H. Nidditch. New York: Oxford University Press, 1975-06-12. isbn: 978-0-19-824536-0.

    ... but there is ambiguity around whether 1751 is the relevant original date, given ...

    Selby-Bigge and Nidditch’s 1975 edition is based off a collection of Hume’s essays posthumously published in 1777.

    We could have chosen 1777 as the original date. But, in the case, we have reasons for choosing one date (1751) over another (1777). So in the reference entry we can add an annotation that explains all this. The annotation thereby handles the ambiguity ...

    Selby-Bigge and Nidditch’s 1975 edition is based off a collection of Hume’s essays posthumously published in 1777. However, Hume’s “An Enquiry Concerning the Principles of Morals” was first published in 1751, and that's the date we use here.

    Feature requests for Biblatex

Bilatex already handles:

Biblatex also provides for date ranges.

So I request the additional functionality for:

Often enough there might be no semantic difference between a circa date and a question marked date. Both can be used to express a uncertainty about a date. It's rare to come across a questioned marked date, relative to a circa date. So it might be tempting to ignore question marked dates with the rule: "If you are uncertain about a date just designate it as a circa date".

However, there probably are going to be contexts, albeit rare, in which an author does want to maintain a semantic difference. E.g. For dates they are personally uncertain about, the author might tag with a question mark. For a date that the author knows a community of scholars has established as having a lack of precision, the author might tag with a circa.

On the issue of whether to support output formats "ca." or "c." as the abbreviation for circa, I'm undecided. Personally I can generally mostly recall seeing c. 1815 and probably prefer this look, but the Chicago Manual of Style promotes 'ca.'. Perhaps both possibilities need to be supported for style authors who, in turn, provide options for users (with relevant defaults for a chosen style).

simifilm commented 8 years ago

With the latest additions, something is severely broken. I get

! Undefined control sequence.
l.1230 \DeclareLabelalphaNameTemplate
                                   {
? 
! Undefined control sequence.
l.1231   \namepart
                [use=true, base=true, strwidth=1]{prefix}
? 

! LaTeX Error: Missing \begin{document}.

See the LaTeX manual or LaTeX Companion for explanation.
Type  H <return>  for immediate help.
 ...                                              

l.1231   \namepart[
                 use=true, base=true, strwidth=1]{prefix}
? 
! Undefined control sequence.
l.1232   \namepart
                [base=true]{family}
? 
! Undefined control sequence.
l.1233   \namepart
                {given}
? 
njbart commented 8 years ago

Will it allow [a] date like “13th century”?

If you are asking about output:

plk commented 8 years ago

@simifilm - I'm in the middle of quite large changes to the labelalpha mechanism - the github source isn't guaranteed to be stable but you can just comment out the \DeclareLabelalphaNameTemplate declaration in biblatex.def for now.

JohnLukeBentley commented 8 years ago

After sleeping on it I think, in my previous large post, rather than trying to illustrate:

Human readability and understandability being distinct.

E.g. date = {-0279} is quite human readable, in that the number can be read to be minus two seventy nine. But it's not readily understandable, especially by those not familiar with date standards, as being equivalent to the year two hundred and eighty before the common era. At least, I mean, this is plausible way of speaking about the example.

By contrast datetime = {2004-01-01T10:10:10+05:00} is understandable, but not (relative to alternatives) human readable. That is, even folk not familiar with datetime standards could understand, through making some assumptions, what this string means. However, reading it is a bit of an strain given the lack of space delimiters.

So it's with the criteria of human readability and understandability that we might weight candidate datetime standards. That is, in conjunction with other criteria.

For the next post, or next few posts, I'd like to put aside the issue of "Should there be a colloquial format?" by addressing:

Whether or not there is to be colloquial input format, which should be the strict format: EDTF or iso8601:2004?

And given the renewed enthusiasm, from Nick and Philip, of EDTF: I'll look at the matter with EDTF as the leading candidate. That'll entail revisiting some of the issues I've previously mentioned, as well as raising new issues.

Again, this is just an issue of input formats. ...

Allow "+" sign (for years)?

I'm revisiting this issue partly because I want to make sure I get the two different standards right.

ISO8601:2004 allows the "+" sign for years. I previously quoted "3.4.2 Characters used in place of digits or signs". So ISO8601:2004 allows both of the following formats:

+0001
+0000
-0001

0001
0000
-0001

EDTF. I previously didn't provide evidence of what EDTF says on this issue. So ...

From http://www.loc.gov/standards/datetime/pre-submission.html#bnf

date =  year | yearMonth | yearMonthDay
...
year = positiveYear | negativeYear | "0000"

positiveYear =
       positiveDigit digit digit digit
     | digit positiveDigit digit digit
     | digit digit positiveDigit digit
     | digit digit digit positiveDigit

negativeYear = "-" positiveYear
...
digit = positiveDigit | "0"
positiveDigit = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

So EDFT doesn't allow "+". It enforces a scheme like:

0001
0000
-0001

As I previously argued, allowing the "+" sign for years would be helpful in a scenario where you wanted to line up years (and dates) in a column (as previously exemplified when using Zotero and Zotero-better-bibtex, which export to biblatex).

On the other hand, that EDTF enforces only one scheme, with respect to allowing "+" (it doesn't), may make it attractive.

Therefore I would agree that this issue causes no significant impediment to using EDTF over ISO8601 for biblatex.

Allow changeable precision times?

On the presumption that Biblatex will support datetimes to some degree (as Philip had indicated it would), or at least there's a strong possibility it might in the future ...

What the standards say

In ISO8601:2004 changeable precision times are allowed:

4.2.2.2 Complete representations

...

Basic format: hhmmss Example: 232050
Extended format: hh:mm:ss Example: 23:20:50

4.2.2.3 Representations with reduced accuracy

If the degree of accuracy required permits, either two or four digits may be omitted from the representation in 4.2.2.2.

a) A specific hour and minute
    Basic format:       hhmm    Example: 2320
    Extended format:    hh:mm   Example: 23:20

b) A specific hour
    Basic format:       hh      Example: 23
    Extended format:    not applicable

In EDTF changeable precision times are not allowed:

5.1.2 Date and Time

A date/time string MUST be composed according to one of three representations as illustrated in the following three examples:

2001-02-03T09:30:01
2004-01-01T10:10:10Z
2004-01-01T10:10:10+05:00
  1. BNF
 time = baseTime zoneOffset?
        baseTime = hour ":" minute ":" second | "24:00:00"

It is somewhat surprising to find EDTF allowing changeable precision in dates, even going so far as to provide several ways to express that variable precision, while affording no such flexibilty for times.

Is this a problem?

An example way in which bibliographic data might be feed into biblatex is via reference management software (like Zotero) that, in turn, extracts metadata from a website. There are a plethora of (X)HTML embedded metadata schemes. Two of the more popular ones are Dublin Core and (the emerging) JSON-LD with Schema.org.

Dublin Core

There are many (overly complex) ways to express Dublin Core metadata in (X)HTML5, but one (https://wiki.whatwg.org/wiki/MetaExtensions conforming) paradigmatic example is:

<link rel="schema.DCTERMS" href="http://purl.org/dc/terms" />
<meta name="DCTERMS.title" content="Services to Government" />
<meta name="DCTERMS.modified" scheme="DCTERMS.W3CDTF" content="2016-06-10T20:00:09+1000" />

Note the W3CDTF standard that Dublin Core often uses for datetimes, https://www.w3.org/TR/NOTE-datetime (See http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=terms#terms-W3CDTF). This is W3C's Date and Time Formats Note of 1998-08-27 (which EDTF mentions).

The W3CDTF subsets ("profiles") ISO8601, as does EDTF. But unlike EDTF, W3CDTF allows for times that drop seconds. That is, W3CDTF allows:

YYYY-MM-DDThh:mmTZD (eg 1997-07-16T19:20+01:00; or 1997-07-16T18:20Z)

So metadata coming into Biblatex that ultimately derives from a Dublic Core Metadata element like ...

<meta name="DCTERMS.created" scheme="DCTERMS.W3CDTF" content="1997-07-16T19:20+01:00" />

... will break if Biblatex uses (and enforces) EDTF as its strict format.

However, looking at an almost randomly chosen production website, a national news site which appears to implement Dublin Core Metadata well, they use times in the long form. From http://www.abc.net.au/news/2016-06-10/barnaby-joyce-denies-telling-woman-to-piss-off-in-tamworth-pub/7501090 ...

<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DCTERMS.issued" scheme="DCTERMS.W3CDTF" content="2016-06-10T20:00:09+1000"/>
<meta name="DCTERMS.modified" scheme="DCTERMS.W3CDTF" content="2016-06-10T23:44:31+1000"/>

... There'd be a major website that uses the short time format in Dublin Core, but I can't find one on a quick search.

JSON-LD with schema.org

Google is promoting JSON-LD for metadata in (X)HTML.

You provide structured data markup in your HTML ... pages... JSON-LD is the recommended format. Google is in the process of adding JSON-LD support for all markup-powered features. The table below lists the exceptions to this. We recommend using JSON-LD where possible. https://developers.google.com/search/docs/guides/intro-structured-data

Google notes that when JSON-LD you typically use:

the schema.org vocabulary — an open community effort to promote standard structured data in a variety of online applications.

Schema.org appears to enforce times in long form only:

A combination of date and time of day in the form [-]CCYY-MM-DDThh:mm:ss[Z|(+|-)hh:mm](see Chapter 5.4 of ISO 8601). https://schema.org/DateTime

And here's another almost randomly chosen production example, http://www.bbc.com/news/science-environment-36505748 , that implements JSON-LD with a schema.org time in long form :

"datePublished": "2016-06-11T09:05:50+01:00"

So that's an example of a standard, and use of a standard, in the wild that conforms to EDTF in terms of time precision (the long time is enforced).

Other meta datetime production practices

I'll show a random sampling of the top newspapers in the US, looking at their datetime metadata (regardless of what scheme it conforms to).

Wall Street Journal. A full time format. http://www.wsj.com/articles/imf-warns-china-of-risks-of-mounting-corporate-debt-1465613146

<meta name="article.published" content="2016-06-11T02:45:00.000Z" />

New York Times. They are all over the shop ... http://www.nytimes.com/2016/06/12/magazine/what-if-ptsd-is-more-physical-than-psychological.html?hp&action=click&pgtype=Homepage&clickSource=story-heading&module=photo-spot-region&region=top-news&WT.nav=top-news

<meta name="pdate" content="20160610" />
<meta name="utime" content="20160610235833" />
<meta name="ptime" content="20160610050026" />
<meta name="DISPLAYDATE" content="June 10, 2016" />

LA Times. Another full time format. http://www.latimes.com/politics/la-na-pol-democrats-unity-20160611-snap-story.html

<meta itemprop="datePublished" content="2016-06-11T03:00:00-0700" data-meta-updatable />

Allow changeable precision times? Conclusion.

A summary of the facts:

On this issue the options for Biblatex seem to be:

Do I get it right that if EDTF was used in biblatex it must WARN, not produce a fatal error, for short datetime (e.g. "1997-07-16T19:20+01:00") formats? Would the need for a WARN, rather than fatal error, for short datetime formats critically count against using EDTF in Biblatex?

Allow space between date and time; and a space between time and timezone?

Overview

Should we have an input standard that allow a space between date and time, as in

2004-01-01 10:10:10+05:00

And even a space between time and time zone?, as in ...

2004-01-01 10:10:10 +05:00

I'm revisiting this issue partly to ensure I'm reference the standards right; and it might count as the most critical impediment to EDTF adoption.

What the standards say

As mentioned ISO8601:2004 allows a space between date and time ...

By mutual agreement of the partners in information interchange, the character [T] may be omitted in applications where there is no risk of confusing a date and time of day representation with others defined in this International Standard. (Under "4.3 Date and time of day > 4.3.2 Complete representations")

ISO8601:2004 forbids a space between time and timezone ...

4.2.4 UTC of day

To express UTC of day the representations specified in 4.2.2.2 through 4.2.2.4 shall be used, followed immediately, without space, by the UTC designator [Z]

4.2.5.2 Local time and the difference from UTC

When it is required to indicate local time and the difference between the time scale of local time and UTC, the representation of the difference shall be appended to the representation of the local time following immediately, without space, the lowest order (extreme right-hand) ...

EDTF forbids a space between date and time (From EDTF "8. BNF"), and forbids a space between time and timezone.

dateAndTime = date "T" time
    time = baseTime zoneOffset?
            baseTime = hour ":" minute ":" second | "24:00:00" 
zoneOffset = "Z"
    | ("+" | "-")
                (zoneOffsetHour (":" minute)?
                | "14:00"
                | "00:" oneThru59 ) 

Given that both standards forbid a space between time and timezone, we don't need to consider that factor.

So ISO8601:2004 allows, and EDTF forbids:

2004-01-01 10:10:10+05:00

I imagine all of us would be agree that with the space the above format is more human readable than ...

2004-01-01T10:10:10+05:00

Either format is equally "understandable": even folk unfamiliar with datetime standards could guess at the meaning of "T". So "understandability" is not a factor.

So I think the relevant question is:

Is the relative lack of human readability in an EDTF enforced format like 2004-01-01T10:10:10+05:00 critical enough to dismiss EDTF as the strict format for biblatex (and especially in the light of my prior emphasis of the possible future importance of human readability)?

Summary

In summary it would be great if you all, Philip, Nick, Simon (if interested), or anyone else would address the question ...

Whether or not there is to be colloquial input format, which should be the strict format: EDTF or iso8601:2004?

... by answering ...

On allowing "+" sign (for years):

On changeable precision times:

On allowing a space between date and time.

I mean, for Nick, that may well essentially entail a repetition of previous answers. But I hope, at least, such repetition could be made with an increased confidence, or a new willingness to bear previously unseen difficulties, having taking into account the issues I raise.

plk commented 8 years ago

I don't see any particular problem with any of these against EDTF. I would prefer to implement strict EDTF and no colloquial support in the core. Having said that, of course \DeclareSourcemap can essentially massage anything into strict EDTF and so I think this is a nice solution. Whether or nor there are any driver level mappings to do this (that is, ones which come with biblatex) is another matter. I am more of a mind to leave this to style level mappings for areas which want to coerce date formats for their users just as we only support in core general style concerns and not domain-specific ones.

I will have a think about the 5.2.2 level 1 things - this might become very complicated and not worth the bother.

simifilm commented 8 years ago

Not sure whether this is still work in progress, but at the moment, ifdateera only works if datelabel=edtf is set, although this option doesn't seem to exist (anymore) according to the manual.

plk commented 8 years ago

@simifilm - should be stable now and is uploaded. That issue is fixed.

plk commented 8 years ago

I am not convinced about EDTF 5.2.2 as it says "Precision for a date whose string includes the 'u' syntax assumes that the unspecified portion will eventually be supplied." and, apart from it not being clear what that means, it's arguably meaningless in a bibliography context which is usually a publishing context where nothing will be "eventually supplied". The current dev version implements strict EDTF without 5.2.2 and parses times. Most of the internals for time support are also complete. We have to decide on the core output formats for times as for dates ("long", "short", "comp" etc.).

On the other hand, I'm open to parsing 5.2.2 level 1 and putting suitable fields in the .bbl to indicate the "unspecified" status but I doubt it makes any sense to do anything with this information in standard styles as such information applies to more specialist applications such as, perhaps, archival materials.

simifilm commented 8 years ago

@plk AFAICS this still needs the option datelabel=edtf which currently is not documented. The manual only mentions iso8601, but according to biblatex.sty iso8601 is deprecated. And I am probably missing something, but as I said earlier, I think two things get mixed up here. ifdateera is only available when datelabel is set to iso8601/edtf. But in my understanding, datelabel defines the output. So this means that I can't use ifdateera if I want date=long for example.

JohnLukeBentley commented 8 years ago

EDTF as strict format.

Philip:

I don't see any particular problem with any of these against EDTF.

Yes I'm inclined to agree. Or, in other words, the problems don't outweigh the advantages of EDTF for biblatex purposes (providing a convention for approximate and uncertain dates). Specifically, ...

On the human readability of 2004-01-01T10:10:10+05:00 - well I hate reading it. But in virtue of it being "understandable" that's no great impediment in the scenario I hope biblatex can find use (as a front end format).

On handling EDTF illegal datetimes, because the time is too short, e.g. 1997-07-16T19:20+01:00 ... I'll be curious to see what you come up with. But there a few options. Probably a matter left to decide once you have your hands on the code.

EDTF conformance level

I would prefer to implement strict EDTF and no colloquial support in the core.

Noted. I will press the argument for colloquial support in the core, in addition to the strict (now EDTF) format, in a subsequent post. But I'll hold off for now in order to give Nick a chance to catch up. Except to say that everything I want to be expressed by a colloquial format should be expressed by the EDTF implementation. That is, to give folk the ability to ignore the colloquial format if they want.

On the issue of the conformance levels to implement for EDTF. The spec states

The specification defines three levels:

  • Level 0: Features supported by 8601
  • Level 1: Level 0 plus level 1 extensions
  • Level 2: Level 1 plus level 2 extensions

An implementation of this specification MUST support Level 0, and MUST state which (if either) additional level (1 or 2) is supported.

So there doesn't seem to be formal scope for a partial implementation of a level.

However, I too am not sure what we need in Level 1 beyond "5.2.1 Uncertain/Approximate". And if we don't, then I don't think there's a particular problem with claiming a conformance like "EDTF level 0 plus Level 1:5.2.1 Uncertain/Approximate".

5.2.2 Unspecified

If a work was published "some time in the 13th century" it could be encoded as 12uu as Nick suggests (from "5.2.2 Unspecified"). But that would seem to violate the rule "the 'u' syntax assumes that the unspecified portion will eventually be supplied", as you suggest. That is, if we imagine work where the lack of precision about the date of publication was established by the scholarship: there may be no expectation that the unspecified portion will be "eventually supplied".

5.2.3. Extended Interval (L1)

However, "5.2.3. Extended Interval (L1)" seems able to encode "some time in the 13th century" as 1200/1299.

"5.2.3. Extended Interval (L1)" would seem to be necessary for taking care of one the cases I first mentioned Da Vinci, Leonardo. c. 1487–1490. Codex Trivulzianus.. That is, as 1487~\1490~.

5.2.4 Year Exceeding Four Digits (L1)

This seems unnecessary. The earliest writing occurred in 3200 BCE (-3199). We have a long way to go before authors need to reference works with 5 digit positive years (10000). So it seems we can get away entirely with 4 digit years. Are bibliographies sometimes exploited for listing specific fossils?

5.2.5 Season

I'm not sure that seasons are necessary. Journals sometimes have "Spring" edition but always (?) can be referenced via year, optional volume number, and optional issue number.

Even if seasons do need to be expressed a format like "2001-21" looks like it might have the potential to confused as expressing an ordinal day or week in the year.

Workload

As a matter of workload it might be easier to implement "EDTF level 0 plus Level 1:5.2.1 Uncertain/Approximate" and a colloquial format (if I can convince you of this and after I suggest some modifications to the colloquial format) as a first iteration. Then do some debugging. Then have a look at the other sections in level 1, in a subsequent iteration.

Roof issues

Simon. In my judgement the issues you raise are roof implementation details which can't properly be attended to until the foundations are sorted out. The foundations are currently in flux.

For example it's unclear what level of EDTF ought be supported (I've expressed a view above). When that becomes clear that might have impact on the names for values. E.g. datelabel=edtflevel1.

On the other hand Philip seems generally open to addressing these sort of issues you raise - and this is entirely a matter for Philip - the person who's coding it. So until Philip says otherwise I'd say keep those sort of suggestions coming. I just thought I'd let you know why I am not responding to them.

njbart commented 8 years ago

On EDTF 5.2.2: I read that passage differently. The main definition appears in EDTF 4.: “Unspecified: The value is unstated. It could be because the date (or part of the date) has not (yet) been assigned (it might be assigned in the future), or because it is classified, or unknown, or for any other reason.” (my emph.)

The passage “Precision for a date whose string includes the 'u' syntax assumes that the unspecified portion will eventually be supplied. Thus 199u and 19uu have year precision, 1999-uu has month precision, and 1999-01-uu and 1999-uu-uu have day precision.” on the other hand merely seems to focus on the precision that is to be ascribed to a date that contains one or more ‘u’s.

Hence I continue to feel that 19uu and 199u could very well be used as shorthands for century and decade, both of which have their role in bibliographies.

But that’s a minor issue. Apart from that: Great news.

plk commented 8 years ago

Ok, I will look at the 5.2.2 things, there is a case for it.

@simifilm - I can't reproduce what you're seeing here - I have a test doc using dateera with date=long etc. and it's all fine. The doc should also have that option correctly - perhaps you have an out of date version? The latest pushed git and bundled DEV versions should have all of this - if not, let me know. Don't forget that authoryear* citations only use labelyear and so are controlled by datelabel.

plk commented 8 years ago

Note on current state - 3.5/2.6 currently implement all of EDTF level 1 apart from 5.2.2 - see 96-dates.tex example file and PDF doc.

simifilm commented 8 years ago

@plk I see that my .docs were out of date, but the rest should be ok.

This example does not give me negative dates:

\documentclass[a4paper]{article} \usepackage{fontspec} \usepackage[american]{babel} \usepackage{csquotes} \usepackage{filecontents} \begin{filecontents}{\jobname.bib} @book{buch, author= {Wurm, Tom}, title = {Das Buch}, date = {-2988}, location = {Die Stadt}, publisher = {Der Verlag}} \end{filecontents} \usepackage[style=authoryear,% datelabel=long, %dateuncertain=true,% %datecirca=true, backend=biber]{biblatex} \addbibresource{\jobname.bib} \begin{document} \cite{buch} \printbibliography \end{document}

EDIT: If I change datelabel to edtf, it works.

simifilm commented 8 years ago

Something else I noticed: with negative dates, biblatex seems to insist on 4-digits years. Something like year=-321 is not accepted.

JohnLukeBentley commented 8 years ago

On 5.2.2 Unspecified

Nick, I read the two passages you quoted as identifying a contradiction in the EDTF spec. One can't both stipulate "unspecified" to mean:

However, we could just ignore the contradiction and choose to interpret "u" for unspecified according to the first passage (yet to be assigned, "classified, or unknown, or for any other reason").

Essentially agreeing with you that ...

Hence ... 19uu and 199u could very well be used as shorthands for century and decade, [in addition to the other range of imprecisions specified under "5.2.2 Unspecified"]

One or more of us might find it valuable to participate in the EDTF listserve.

Nick, any thoughts on my post containing "In summary it would be great if you all, Philip, Nick, Simon (if interested), or anyone else would address the question ..."?

plk commented 8 years ago

I propose that 5.2.2 is dealt with by expanding such strings into the appropriate date range and also marking it with a field like Xdateunspecified{Y} where X is the date type (event, orig etc) and Y is the unspecified granularity (day, month, year, decade).

plk commented 8 years ago

@simifilm - EDYF mandates 4 digit dates. The example you give is correct at the moment as only EDTF output has an "era neutral" output format. Other output needs to specify the era style - do you think that there should be a default?

simifilm commented 8 years ago

I think a negative date should never be just simply printed as a positive date without any indication that it's actually something else.

JohnLukeBentley commented 8 years ago

Philip, for outputting 5.2.2 you mean something like ...

author = {Conradus Saxo}
origdate = {12uu}
title={Speculum Beatæ Mariæ Virginis}
...
=>

 (Saxo 1200/1299)

% ... and so ...

199u => 1990/1999
19uu => 1900/1999
1999-uu => 1999-01/1999-12
1999-01-uu => 1999-01-01/1999-01-31
1999-uu-uu =>  1999-01-01/1999-12-31

?

That looks like a consistent output scheme. If there was to be other output schemes at least the one you are suggesting looks like one that might be highly desirable. So at least yours seems worth implementing.

On outputting negative dates. I think we'd want to allow ...

-0279 (e.g. for `alldates=edtf`; and `alldates=iso8601`)

0280 BCE (for `alldates=colloquial`, or whatever the value would be). 
          That is, leading zero to make for 4 digits.

280 BCE (with an option for those who hate leading zeros).

Default: iso8601.

I think of the iso8601 output format as supporting space delimiters, while the EDTF output format would not. Something that only comes into play with datetimes, but might be selected for bibliographies containing both date times and BCE (negative) dates. For example someone writing a paper on Plato might reference a blog post as well as Plato's Republic.

They therefore my want to pass the option alldates=iso8601 to biblatex in order to output 2016-02-07 03:30:20 +10:00, as in ...


Plato (-0279). Republic. Trans. by C. D. C. Reeve. 3rd edition. Indianapolis: Hackett Publishing   
    Company, Inc. 392 pp. isbn: 0-87220-737-4.
Priest, John (2016). 2016-02-07 03:30:20 +10:00. Philosopher of the month: Plato. url: http : / /

If they chose alldates=edtf, by contrast, then that might want ...


Plato (-0279). Republic. Trans. by C. D. C. Reeve. 3rd edition. Indianapolis: Hackett Publishing   
    Company, Inc. 392 pp. isbn: 0-87220-737-4.
Priest, John (2016). 22016-02-07T03:30:20+10:00. Philosopher of the month: Plato. url: http : / /
    blog.oup.com/2016/02/philosopher- of- the- month- plato/ (visited on 2016-06-13).

I mean I haven't thought much about how and where a datetime ought fit into the bibliographic entries' output. I'm not sure what the style guides say, if anything. But if we are supporting datetimes then iso8601 and edft output choices can express something different: whether space delimiters are used. Alternatively, or in addition, you may what a space delimiter option for output datetime strings.

In terms of negative years, when outputting my own documents I'd be going for 0280 BCE, but I realize I might be freakish here. That's also why I recommend the ISO8601 default.

All that's not very thought through. It's offered as something for you to push against.

Edit: "though" to "thought". Grammar.

simifilm commented 8 years ago

Is there a way besides DeclareBibliographyOption to test whether a dateera option was set?

njbart commented 8 years ago

Nick, I read the two passages you quoted as identifying a contradiction in the EDTF spec.

No, I don’t think it’s a contradiction. 4. provides the definition of “unspecified”; 5.2.2 merely contains a definition of the precision of strings with unspecified elements. I’d paraphrase the passage from 5.2.2 as “in order not to leave the precision of any EDTF date/time string undefined, we treat strings containing ‘u’s as if the ‘u’s had been replaced by actual digits”.

Nick, any thoughts on my post containing “In summary it would be great if you all, Philip, Nick, Simon (if interested), or anyone else would address the question …”?

No, nothing new for now, I’m afraid.

plk commented 8 years ago

@simifilm - yes - there is a \ifdateera test and similar tests for uncertain and circa.

simifilm commented 8 years ago

@plk I think there is a misunderstanding. ifdateera tests whether a certain date has an era set. But I am asking about the biblatex option. Can I test whether the dateera option of the whole document is set to secular, christian or not at all.

plk commented 8 years ago

@simifilm Ah, there is no particular test at the moment but if we default dateera to say, secular anyway, there would be no point as it would always be true.

plk commented 8 years ago

@JohnLukeBentley - yes, that would be the idea with 5.2.2 and ranges. They would also set internal field markers to differentiate from ranges which are the same but were explicitly set. This would enable a style to use, say 19uu as "20th century" or to just use the resulting range if preferred.

moewew commented 8 years ago

I know that the implementation of this whole feature is still by no means finalised, but I noted that the changes introduced quite some machinery into the .cbx files and more specifically the cite bibmacros (with all the \let\ifdateera\iflabeldateera and friends).

I'd find it conceptually neater if the date printing thingy could be dealt with further upstream in the date printing macros (maybe even in \printfield{labelyear} and friends). That would allow for the date format to be specified at one place and would then not require changes to many basic macros. (I'm thinking about the usability of all this for custom styles as well that would have to take over the machinery as well.)

plk commented 8 years ago

It's not finalised yet. What you see there is only for citations and only for authoryear styles. I'm not clear yet about the final organisation for citations.

plk commented 8 years ago

@moewew - this has been redone and there are no longer any changes in the .cbx files necessary.

plk commented 8 years ago

Dear all - I would like some feedback on the time formats to support by default. I think perhaps:

am/pm format 24h format

only? We also also have to think about time format localisation and to this end, the separator and "am", "pm" strings will be localisation strings. There will also be an option to determine timezone output format.

simifilm commented 8 years ago

@plk Have you pushed the latest changes to GitHub already? ATM, I see strange things happening (for example, the era is always printed, no matter whether dateera is set or not).

plk commented 8 years ago

@simifilm - yes isn't that what you mentioned? So that negative dates are never made positive? Currently, dateera defaults to secular and therefore is always "on" but only prints something for negative dates (unless dateeraauto is used to force it for AD/CE dates below a certain threshhold).

simifilm commented 8 years ago

Ah, ok. I think it would make much more sense to print -1123 as -1123 by default – without any addition. Just print the content of the field. Also something seems to have changed about the four digits rule: Years still need to have four digits, but something like 0123 is now printed as 0123 – including the zero. This did not happen before and is a mistake IMO.

plk commented 8 years ago

While looking at this, I realised that the datezeros package option, the default value of which is true, was not really working properly - it suggested that it enforced leading zeros but it didn't. Now it does, for all date parts which need them. Of course, this means that we might change the default to false but then most months would be single digits which is less expectable than four digit years ...

Currently, "-" before years only happens in "edtf" output format (ex-iso8601). I suppose it could be a default but negative dates never worked at all before so there isn't much expectation at this point.

simifilm commented 8 years ago

As you said, since negative dates never worked, there's nothing to break here. But I guess my expectation would be that without any option specifically activated, fields get printed more or less unaltered.

As for the datezeros option – that explains things.

JohnLukeBentley commented 8 years ago

So now that we have a strict input format settled upon, EDTF, I'll revisit the argument for an additional colloquial input format.

This is only an argument about input formats, not output formats.

Firstly, indulge me specifying an example colloquial input format again, now that we have EDTF to build upon. My example will modify slightly the existing suggestions (and prior implementation).

The colloquial input format

Overview

Biblatex inputs datetime fields according to a strict format with optional colloquial alternatives.

The strict format is a EDTF string, conforming to level 1. [Explanation and examples] ....

The colloquial format offers some alternatives. There is no need to learn the colloquial format, for everything you can express in a colloquial format can be expressed as EDTF string. However, you might find working in a colloquial format easier for some purposes.

The colloquial format alternatives are only for: negative, BCE/BC years; and approximate (circa) dates.

BCE/BC years

To express a EDTF negative year in a colloquial format: minus 1; take the absolute; optionally add a space; then add a "BCE" or "BC" suffix. Keep the years as four digits.

-0379 => 0380 BCE
-0025 => 0026 BC
-1234-10-11 => 1235-10-11BCE

Using a negative sign with a "BC" or "BCE" suffix is illegal. E.g. -0379 BCE will throw a fatal error.

Using "CE" or "AD" is illegal and will result in a fatal error.

Approximate dates

In EDTF approximate dates are expressed with tilde ~. The colloquial alternative is to prefix the date(time) with a "c", optionally with a space delimiter. E.g.

c -0379
c0380 BCE
c 1487/c1490

Using a tilde "~" with a circa "c" prefix is illegal. E.g. c 1230~ will throw a fatal error.

About this colloquial input format

You'll have observed that some of the modifications to the previous suggestions are for the sake of simplifying and tightening up the colloquial format. Specifically in:

For @nickbart1980 is right to set store by "clarity and elegance" ...

Rejoinders to colloquial input format opposition

@nickbart1980 gave a couple of arguments against a colloquial input format.

Firstly,

Allowing “colloquial” input formats would only water down biblatex’s clarity and elegance.

Not if we carefully distinguish the strict (now EDTF) format from the colloquial. Moreover, ensure that there is nothing in the colloquial format that can't also be expressed by the strict format. That allows anyone who doesn't care for the colloquial format, to ignore it. This would require being clear about the distinction in the documentation. "Here is the strict EDTF format ... that's all you need, but here are some colloquial alternatives if you find them handy ...".

Secondly,

Also, if we allow “colloquial” formats here, we’d also have to accept other “colloquial” formats like “23 Apr 2016”, “23/04/2016” and many others. I’d be strongly opposed to any of this.

That doesn't follow. You can allow some colloquial formats without allowing all colloquial formats. That is, if the basis for allowing a colloquial format is to judge each proposal on it's own merits. I too would be strongly opposed to input formats like “23 Apr 2016”, “23/04/2016”.

@plk gave a slightly different argument ...

after we throw open the doors to colloquial formats, it can never be closed

It is true that if you draw the line at a strict format (like EDTF) it is easier to point to a principle like "We don't allow colloquial input formats" as a clear rule that might dissuade others to try. However, I'd suggest there's no special difficulty in holding fast to a rule like "We don't allow colloquial input formats, that have no good reason for being".

Reasons for the colloquial input format

Again, I'm thinking of biblatex's potential use as front end format, as in a single markdown + biblatex document. A context, that is, where human readability and understandability would be required. But even when biblatex entries are kept in their own seperate file, as they usually are, I'd suggest human readability and understandability is worth a great deal.

I'm less wedded to the "c" prefix approximate alternative. But the BCE/BC alternative seems important as that's how we traditionally date writing and objects in the ancient world (below year 0000/1 BCE).

An argument could be made that we ought promote the better date format, a format like EDTF that uses negative years (with a calendar that uses the year zero), and try to usurp the traditional BCE/BC scheme altogether. That is, in order that one day professors giving lectures will reference to their students Plato's Republic as being "... written about minus three seventy nine".

But even if one was committed to such a view then a transition period would be necessary as people move from their traditional way of dating, to the modern.

It is in virtue of the minus-one-take-the-absolute conversion, as from origdate={-0379} to "380 BCE", that one would have to do when reading a biblatex file and correlating it with what one sees on the copyright page of one's copy of Plato's republic ... that makes the EDTF/ISO negative year not readily understandable - in the sense that there's an additional cognitive burden one has to take on every time you are reading the biblatex file.

Allowing origdate={0380 BCE}, by contrast, entails that one can forgo the cognitive burden of the minus-one-take-the-absolute conversion when reading the biblatex file. And, again, this is not an argument for having the colloquial format instead of the EDTF format.

I highly recommend having the colloquial input format in addition to the EDTF input format. For negative, BCE/BC, years at least.

simifilm commented 8 years ago

With the latest changes something is severely broken. No matter what I do, I get the following fatal error:

(/Users/simi/Library/texmf/tex/latex/biblatex/blx-dm.def) ! TeX capacity exceeded, sorry [input stack size=5000].

\def l.12643 bibwarn=false}
plk commented 8 years ago

I know, it was a push just for insurance due to travel. Looking at it now.

plk commented 8 years ago

All updated now and this draft has complete EDTF level 1 implemented. See the PDF doc for how 5.2.2 unspecified date parts work. Time format output is not done yet.

plk commented 8 years ago

The only argument for colloquial input formats in core is one of readability (since all colloquial input can be handled by regexp sourcemaps if necessary) but I'm not sure this is enough because so many people are moving to using .bib GUI front ends which would be able to do this presentation independently of the source data. I am not sure which ones do/can as I don't use them myself but I am keen to promote the data/presentation separation which latex/biblatex encourages at all levels, including bib data/presentation.

simifilm commented 8 years ago

There currently is a problem with dateera=christian, itt produces strange errors. dateera=secular seems to work though.

plk commented 8 years ago

Should be fixed now.

simifilm commented 8 years ago

Yeah, this is I fixed now, thank you. Sorry for bringing these things up again, but now negative dates are always printed with a minus sign. If dateera=christian is set, the result is something like -1234 BC which AFAIU is not valid in any case.

plk commented 8 years ago

It's fine - useful to have some feedback. This should be fixed now too.

simifilm commented 8 years ago

Now I get another error. This is the MWE

\documentclass[a4paper]{article} \usepackage{fontspec} \usepackage[american]{babel} \usepackage{csquotes} \usepackage{filecontents} \begin{filecontents}{\jobname.bib} @book{buch, author= {Wurm, Tom}, title = {Das Buch}, date = {-2988}, origdate = {-1988}, location = {Die Stadt}, publisher = {Der Verlag}} \end{filecontents} \usepackage[style=authoryear,% backend=biber]{biblatex} \addbibresource{\jobname.bib} \begin{document} \cite{buch}

\printbibliography \end{document}

And it gives me:

! Undefined control sequence.

\edef \blx@tempa {\blx@dateera@bce } l.23 \cite{buch} ? Package biblatex Warning: Bibliography string '' undefined (biblatex) at entry 'buch' on input line 23. (compiling luc: /usr/local/texlive/2016/texmf-var/luatex-cache/generic/fonts/otl /lmroman12-regular.luc)(load luc: /Users/simi/Library/texlive/2016/texmf-var/lua tex-cache/generic/fonts/otl/lmroman12-regular.luc)(compiling luc: /usr/local/tex live/2016/texmf-var/luatex-cache/generic/fonts/otl/lmroman12-bold.luc)(load luc: /Users/simi/Library/texlive/2016/texmf-var/luatex-cache/generic/fonts/otl/lmrom an12-bold.luc) ! Undefined control sequence. \edef \blx@tempa {\blx@dateera@bce } l.26 \end {document} ?
JohnLukeBentley commented 8 years ago

@plk Well the chief argument for the colloquial input format was understandability, rather than (human) readability, as distinguished, but I trust we mean to reference the same thing.

(And it wasn't the only argument. There was the arguments that: it doesn't prevent someone from using EDTF exclusively; and it doesn't entail that all colloquial inputs, that all individuals might suggest, must be permitted).

It is good of you to mention the data/presentation separation issue. I agree this is worth preserving.

Observe that my latest colloquial input format suggestion, unlike my initial suggestion, affords such a complete data/presentation separation. I've gotten rid of "CE"/"AD" altogether (although you may prefer to allow it, if a colloquial input format is allowed), and once you've parsed something like -0379 BCE into the core it can be indistinguishable from a date parsed from -0380. That is, I agree "BCE/BC" (or "CE"/"AD") in an input date ought not have any significance for the output date (beyond signifying the ordinal date). All decisions about the output format can, and ought, be made at the option stage, or style stage.

I'd also agree that most folk will continue both: to use GUI front ends to generate .bib files; and avoid reading .bib files directly. However, it is this potentially important and popular future use (and the particular reason why I'm looking at biblatex at all) for biblatex being embedded in a lightweight markup (e.g. markdown) document that would require understandability.

To be clear about the use case: when writing your markdown + biblatex document you'd still want to generate the biblatex from some GUI front end. But when you send that document as is, without transformation into pdf/html/etc, to someone who has no idea about document formats and biblatex (or even someone who does) they'll be better able to understand the document if it contains 0380 BCE dates. (But as document author you'd be still free to use EDTF, e.g. -0379, if you wanted to promote that format).

Recall that even if markdown + (embedded) biblatex does not quite meet Gruber's standard of "without looking like it’s been marked up", it nevertheless would meet his standard of being as "readable [understandable] as possible" and "publishable as-is, as plain text".

There's lots of reasons for working in this plain text format without, or before, transforming into a polished output format. For example you could upload your plain text document to git and invite others to downloaded it and edit it, before you pull those edits into the master and review. Or you could just email it and receive back edits in between email quoting.

On the particulars of the colloquial input format: I am not particularly wedded to the optional spaces. I'd prefer them but if that was part of your opposition then I'd be happy for you to get rid of them. So too for the "c" approximate prefix. The main thing is the "BCE" or "BC" suffix and the minus-one-absolute calculation that would entail.

In the end it's your prerogative and entirely within the realm of reasonableness for you to decide against what I suggest. However, I hope these latest points persuade you to add the colloquial input format (modified as you see fit).

plk commented 8 years ago

@simifilm - should be fixed.

plk commented 8 years ago

At the moment, I won't add any colloquial input formats in the core and if there is a call for this, they'll be added via a sourcemap, potentially only for certain styles which want it (and therefore potentially not in core).

simifilm commented 8 years ago

@plk Works now, and I think the current behavior makes sense. Thanks a lot.

JohnLukeBentley commented 8 years ago

OK Philip.

Note that the potential intermediate solution you mention would offer no advantage for the use case I had in mind (markdown + biblatex), above EDTF only. For my idea is about promoting biblatex as a universal bibliographic format, in the context of that use case. That would require (output) style independence.

I mean with (markdown + biblatex (EDTF only)) I can still promote biblatex as a universal bibliographic format, for it is (output) style independent. But I predict the lack of a BCE/BC input year in the core will dissuade some potential users.

I will console myself with what you have so far achieved: the ability to handle negative years, approximates, and uncertains. And EDTF does have elegance (thanks to Nick for bringing it to our attention). So all that is marvelous.

I'll download the latest and start testing. Thanks for all your work on it so far.

Edit: added "in the core".