plk / biblatex

biblatex is a sophisticated bibliography system for LaTeX users. It has considerably more features than traditional bibtex and supports UTF-8
505 stars 114 forks source link

Circa dates, circa date ranges, and question marked dates. Plus Eras. #427

Closed JohnLukeBentley closed 7 years ago

JohnLukeBentley commented 8 years ago

I request support for:

It would be desirable for these kinds of values to be permissible in all date fields. I use origdate as the more likely example. There might be reasons for choosing different symbols for these kind of dates, to make parsing easier. There might be reasons for disallowing spaces.

edit: This issue thread also combines issue Before the common era (BCE/BC) and common era (CE/AD) date support. #422 /edit

Kinds

When writing a scholarly piece there are several kinds of date ambiguities and uncertainties, these are listed below.

(Part of the power of Biblatex is in providing support for many and any type of style guide. But in the cases below I sometimes borrow from the ...

University of Chicago. 2010. The Chicago Manual of Style. 16th ed. Chicago: University of Chicago. http://www.chicagomanualofstyle.org/16/contents.html.

... because it sheds some light on these issues. I could have chosen another style guide.)

  1. Circa dates. Where the scholarship is only able to fix a date approximately. Generally this is beyond the precision of a year, as in "c. 125 CE" (rather than at the precision of a day: we rarely see, in publishing, something like "c. 0125-02-20").

    ca. or c.

    circa, about, approximately (ca. preferred for greater clarity)

    (University of Chicago 2010, under "10.43 Scholarly abbreviations", http://www.chicagomanualofstyle.org/16/ch10/ch10_sec043.html)

    // These are my examples (derived but not quoted from (University of Chicago 2010))
    (Epictetus ca. 125 CE)
    (Epictetus c. 125 CE)
  2. Circa date ranges.

    Citation: (da Vinci c. 1487–1490)
    
    Reference Entry: 
    Da Vinci, Leonardo. c. 1487–1490. Codex Trivulzianus.
  3. No dates.

    "When the publication date of a printed work cannot be ascertained, the abbreviation n.d."

    Boston, n.d.

    (University of Chicago 2010, under "14.152 'No date'", http://www.chicagomanualofstyle.org/16/ch14/ch14_sec152.html)

  4. Question marked dates.

    A guessed-at date may either be substituted (in brackets) or added. Edinburgh, [1750?] or Edinburgh, n.d., ca. 1750

    (University of Chicago 2010, under "14.152 'No date'", http://www.chicagomanualofstyle.org/16/ch14/ch14_sec152.html)

  5. Ambiguities over which known date is relevant. For example we might have the following reference entry:

    Hume, David. 1751. “An Enquiry Concerning the Principles of Morals.” In Enquiries Concerning Human Understanding and Concerning the Principles of Morals, 3rd ed., edited by L. A. Selby-Bigge and P. H. Nidditch. New York: Oxford University Press, 1975-06-12. isbn: 978-0-19-824536-0.

    ... but there is ambiguity around whether 1751 is the relevant original date, given ...

    Selby-Bigge and Nidditch’s 1975 edition is based off a collection of Hume’s essays posthumously published in 1777.

    We could have chosen 1777 as the original date. But, in the case, we have reasons for choosing one date (1751) over another (1777). So in the reference entry we can add an annotation that explains all this. The annotation thereby handles the ambiguity ...

    Selby-Bigge and Nidditch’s 1975 edition is based off a collection of Hume’s essays posthumously published in 1777. However, Hume’s “An Enquiry Concerning the Principles of Morals” was first published in 1751, and that's the date we use here.

    Feature requests for Biblatex

Bilatex already handles:

Biblatex also provides for date ranges.

So I request the additional functionality for:

Often enough there might be no semantic difference between a circa date and a question marked date. Both can be used to express a uncertainty about a date. It's rare to come across a questioned marked date, relative to a circa date. So it might be tempting to ignore question marked dates with the rule: "If you are uncertain about a date just designate it as a circa date".

However, there probably are going to be contexts, albeit rare, in which an author does want to maintain a semantic difference. E.g. For dates they are personally uncertain about, the author might tag with a question mark. For a date that the author knows a community of scholars has established as having a lack of precision, the author might tag with a circa.

On the issue of whether to support output formats "ca." or "c." as the abbreviation for circa, I'm undecided. Personally I can generally mostly recall seeing c. 1815 and probably prefer this look, but the Chicago Manual of Style promotes 'ca.'. Perhaps both possibilities need to be supported for style authors who, in turn, provide options for users (with relevant defaults for a chosen style).

plk commented 8 years ago

Please try biblatex 3.5/biber 2.6, both on SF in their respective development folders. All of these requirements are now implemented. "circa" is a localisation string which and there are tests for authors to determine if a data came with a circa or uncertainy marker. See the PDF manual, section 2.3.8.

njbart commented 8 years ago

The new date specifications are great, but I am deeply worried about one aspect:

While it’s clear that -03430203 should be parsed as 344-02-03 BCE, it’s extremely confusing to find 343-02-03BC being parsed as 344-02-03 BCE.

343-02-03BC should never be parsed in any other way than 343-02-03 BCE.

Please reconsider.

plk commented 8 years ago

That looks like a date parsing bug, I'll look into it.

plk commented 8 years ago

Actually, this is just a typo in the documentation ...

njbart commented 8 years ago

Indeed, my comment was based on the documentation only. Thank you for looking into this.

njbart commented 8 years ago

https://github.com/plk/biblatex/blob/dev/doc/latex/biblatex/biblatex.tex#L1672: circa. 1723 BC& c. 1723-02 BC Typo?

plk commented 8 years ago

PDF looks ok - that's just a table column sep with a space missing before it - shouldn't make any difference, just looks a little odd in the source. I will correct it anyway.

njbart commented 8 years ago

No, I mean, shouldn’t this be circa. 1723-02 BC& c. 1723-02 BC?

plk commented 8 years ago

Ah, no, the "?" is only for uncertainy, not circa. To get something like "c. 1723-02 BC?", you would have to have had both circa and uncertainty markers "circa. 1723-02 BC?". Of course, this is style dependant. A style could choose to put an uncertainty marker in the output if it only found a circa marker but they are semantically separate in case one doesn't want to do this.

njbart commented 8 years ago

Sorry, again: isn’t a -02 missing in the first column? I.e.,

circa. 1723-02 BC & c. 1723-02 BC

rather than

circa. 1723 BC & c. 1723-02 BC

?

plk commented 8 years ago

Ah, sorry, in the middle of something else and not really reading properly. You are quite right. Corrected in git in a few minutes.

njbart commented 8 years ago

Come to think of it, it seems you just implemented, more or less straightforwardly, the OP’s suggestions.

I’d like to propose using a different format that’s much closer to a generally accepted standard instead, i.e., “Extended Date/Time Format (EDTF) 1.0”, an extension of ISO8601 (https://www.loc.gov/standards/datetime/pre-submission.html); or more specifically, its Levels 0 and 1 – rather than contributing once more to the proliferation of mutually incompatible date formats.

All of what has just been introduced to biblatex can be expressed by EDTF just as well.

Some differences:

However, if there is a general will to keep these new private biblatex extensions, fine (though I personally wouldn’t do so), but at least it would be good if biblatex would understand the EDTF uncertain/approximate formats like 1984?~, too.

JohnLukeBentley commented 8 years ago

Philip, awesome. I have 96-dates.tex working.

I'm now commenting on both Circa and Era issues here, having closed Before the common era (BCE/BC) and common era (CE/AD) date support. #422 for the sake of consolidating discussion.

I haven't yet resolved why my package manager returns version 3.4 for the biblatex package (when it was the 3.5 package I downloaded). Indeed I'll probably, as suggested, post on Tex Stack Exchange to get this resolved. And yes it feels like I may have copied the package files to the wrong place (or failed to delete old packages). However since I have 96-dates.tex working I'll ignore the discrepancy for now.

Another part of my problem, now fixed, was an misconfiguration (highly particular to my setup) of my output directories in TeXStudio and using the wrong latex drivers. Thanks to your tip off I'm now using xelatex and biber.

So the pdf I compile (with xelatex and biber) from 96-dates.tex is identical to your enclosed 96-dates.pdf.

Broadly I think you've taken my initial suggestions and molded them into a robust and document-author accommodating design. I like that, for inputs, you've been permissive. Namely, optional spaces, optional dot after circa markers, all three kinds of circa markers ('c', 'ca', and 'circa'), all four era markers ('BC', 'BCE', 'AD', 'CE').

@nickbart1980 good catch on the missing -02 in the documentation.

And Nick I do think you raise a worthy counterpoint, a case to be made for not being so permissive and to more tightly force users to input dates that conform to ISO 8601. That's a useful link to "Extended Date/Time Format (EDTF) 1.0". In general I too prefer a convergence on one standard rather than a proliferation of standards.

I'm going to take some time (a day or three) to think about a few things before disagreeing or agreeing. I'll also use a modified 96-dates.tex to see if I can break Philip's design/implementation with edge cases.

You, Nick, wrote (looking at Page 73 of the docs)

While it’s clear that -03430203 should be parsed as 344-02-03 BCE, it’s extremely confusing to find 343-02-03BC being parsed as 344-02-03 BCE.

343-02-03BC should never be parsed in any other way than 343-02-03 BCE.

Philip thought this might be parsing bug.

I take it, you Nick, where referencing the issue of year conversion, whether you minus one before taking the absolute.

It looks to me this was a documentation bug. Taking 96-dates.tex (which uses \usepackage[style=authoryear,alldates=short,backend=biber]{biblatex}) I added 343-02-03BC and this parses to 02/03/343BCE. Ignoring the issue of / and year-month-day sequencing, that's right, and in accord with what you suggest: the year string is kept at 343 and not changed to 344.

Philip, having done no style authoring before I'm beginning to appreciate the flexibility on that side of things. For example I see in 96-dates.tex how easy it is to take

\ifdateera{bce}{\bibstring{beforecommonera}}{}

... which produces something like "878BCE" and change it to ..

\ifdateera{bce}{{ }\bibstring{beforecommonera}}{}

... to get "878 BCE".

Would I be right in thinking that, with what you've currently implemented, it would be fairly easy for a style author to create a rule that said "For dates less than 500 return an era suffix; otherwise return no suffix" (for indeed that sort of rule seems like it might be a common want)? E.g. to output ...

1066
877
400 CE
380 BCE

?

One trivial suggestion with respect to your Pdf documentation: It might help if you number your bookmarks, to make it easier to jump to referenced sections (as when referenced here on github). Passing the option bookmarksnumbered to the hyperref package does this.

Nick I think you'll agree that whatever ought be, and will be, the final form of all this, Philip has come with a superb first go, in short time.

Bloody marvelous effort Philip!

plk commented 8 years ago

Please pull 3.5/2.6 again. This is all now integrated into the standard styles with options (dateera, dateuncertain, datecirca) to control what is output along with formatting commands and context-aware delimiters. By default, nothing new is output to maintain backwards compatibility. The 96-dates.tex example file is updated to use these options and contains some commented examples to demonstrate things like your "<500" only scenario.

@nickbart1980 - I do prefer to keep things clean but the problem with biblatex is always that people are using more and more various methods for automating collection of bibliographic data and we have no control over the format and have to be quite flexible in what we accept. I went by ISO8601v2004 on the whole but allowed some more flexibility in the parsing. I have added support for the EDTF circa marker as requested. If in this UAT phase for 3.5 there is an outcry for reduction of date formats, it's relatively easy to do.

JohnLukeBentley commented 8 years ago

Philip and Nick. I've pulled the latest version (not long after Philips last post) and am working through various standards issues. I'll need some more time on this but believe I'll be able to provide useful feedback when done.

JohnLukeBentley commented 8 years ago

Which input format(s)?

Before doing anything else it might be wise to get conceptually clearer on the date/time formats we are aiming at (I use the royal "we" given you, Philip, are the one doing the work and, appropriately and therefore, making the final calls on this). That is, as part of evaluating @nickbart1980's worthy suggestion to be less permissive and follow existing standards.

I take it this is principally an issue of the input format. That is, that none of us have a problem with biblatex supporting all sorts of wild date/time formats when outputted, which will include all of, or many of, the language localisations and style guides (Chicago, APA, etc.). Notwithstanding we'll want a few sensible default and standard output formats.

Our chief question for the moment is therefore: Which input format(s)?

I've now skimmed:

The current (as under development) input format.

So at the moment we have Philip's design/implementation: derived from my initial suggestions, ISO8601:2004, and nickbart1980's (well suggested) additional approximately/circa suffix symbol ~ (as found in EDTF).

So looking at this current specification, as exemplified in "Table 4: Enhanced Date Specifications", page 37, biblatex.pdf I think we'd do well to conceptually divide this into (adding some examples of my own):

Spaces are optional.

So at the moment an input date is required to be formatted as either:

... or ...

Sliced and diced this way we might be in a better position to argue over what the input format(s) should be.

Additionally I think this conceptualisation, or something like it, would assist users in understand what's going on.

Recommendation: In the documentation include the conceptual scheme (or something like it) that distiguishes between: strict; colloquial and additional qualifying symbols.

Allow "+" sign?

ISO8601:2004, under "3.4.2 Characters used in place of digits or signs".

[±] represents a plus sign [+] if in combination with the following element a positive value or zero needs to be represented (in this case, unless explicitly stated otherwise, the plus sign shall not be omitted), or a minus sign [−] if in combination with the following element a negative value needs to be represented. ...

4.1.3.3 Expanded representations

If, by agreement, expanded representations are used, the formats shall be as specified below. The interchange parties shall agree the additional number of digits in the time element year. In the examples below it has been agreed to expand the time element year with two digits.

A specific day Basic format: ±YYYYYMMDD Extended format: ±YYYYY-MM-DD

So the basic form, if a sign [±] is used, is:

+0001 +0000 -0001

But for positive dates and zero, ISO8601:2004 also allows for omitting the sign:

4.1.2.2 Complete representations ...

Basic format: YYYYMMDD ... Extended format: YYYY-MM-DD ...

So the following is permissible:

0001 0000 -0001

Neither format, with or without plus signs, has any sorting advantage. If a date list is represented as a string and includes negative and positive years, with or without a plus sign, there is no possible ordering that begins with the earlier years and ends with the later years.

So the only reason left to support plus signs is:

EDTF does not allow for "+YYYY".

Recommendation:

How many digits for years?

Part of the motivation for keeping all years to 4 digits (or more) in a standard like ISO8601:2004 is sorting and to remove ambiguity. I suggest that kind of motivation ought apply even in the colloquial input format.

On ambiguity consider 343-02 BC. If you are allowing 3 digit years, are you also allowing two and 1 digit years?

Consider 11-02 BC. Is that February of 11 BC; or the second day of November in some unspecified BC year? It's not intuitively clear just from reading it.

Also under ISO8601:2004 three digits are used to represent the ordinal day in a year. E.g. 032 will represent the 1st of February. 032 BC could mean the 1st of February in 1 BC; rather than the year 32 BC. It is better, therefore, to avoid 3 digit dates even in the colloquial input format.

Recommendation: All years, whether under the strict or colloquial input format, be 4 digits. (There'd be no problem supporting a year with less than 4 digits in an output format).

Assuming, that is, that 3 digit years aren't already supported for positive years in production biblatex. If they are I'd suggest deprecating them (allow them through with a WARN message).

In this respect I'm agreeing with Nick here.

Allow time?

Increasingly stuff (articles, books, etc.) gets published on the internet. In this context increasingly the date of publication is too imprecise. We want, rather, the precision of time and date. Chiefly because the time and date serves as a version number.

Recommendation: Support times, with dates. Support optional time zone information. (Ramification: Perhaps, in that case, we'd need a rule: if there is a time, it must come with a date).

Allow space between date and time?

A space is more human readable.

ISO8601:2004 allows it ...

By mutual agreement of the partners in information interchange, the character [T] may be omitted in applications where there is no risk of confusing a date and time of day representation with others defined in this International Standard. (Under "4.3 Date and time of day > 4.3.2 Complete representations")

Recommendation: Allow for space between date and time, as well as the character "T".

Which strict format?

ISO8601:2004 and EDTF

Nick is correct that the colloquial format, for inputting, could be removed without losing expressive power. But before addressing that, let's think about which strict format we want.

The candidates seem to be:

Note that under my way of slicing and dicing it, the EDTF approximate suffix (~) is available as a matter of being a additional and optional "qualifying symbol". So agreeing that ISO8601:2004 ought be the strict format doesn't preclude borrowing from EDTF for "qualifying symbols".

But here I want to mention a few things about ISO8601:2004, EDTF, and the relationship between the two.

Firstly, EDTF is accessible from an intro page at https://www.loc.gov/standards/datetime/ which says

This website describes the current effort to develop a reasonably comprehensive date/time definition for the bibliographic community, as well as other interested communities, and submitting it for standardization or some other mode of formalization, for example a W3C note or an amendment to ISO 8601.

That EDTF was worked out after ISO 8601:2004 and, especially, that it was formulated specifically for "the bibliographic community" makes it impressively relevant to our current biblatex issue. It makes EDTF worth some serious consideration here.

On the other hand EDTF is merely a draft and in a state of flux. So I wouldn't necessarily favour it merely because it is a standard. I'd suggest, the best course is to conform to it or borrow from it if we find there's good reason to do so in virtue of our particular context: the use of dates (and possibly times) in biblatex.

Secondly, EDTF is inconsistent from ISO 8601:2004 in some important respects, even though it derives from it. As EDTF mentions

8601 ... describes a large number of date/time formats, and in many cases provides multiple options for a given format. Thus a second aim of this specification is to restrict the supported formats to a smaller set. This specification therefore profiles 8601 in the sense that it discards many redundant or less-useful features.

So the datetime format (ingoring the date only format for the moment) is more restricted than in ISO 8601:2004. Namely in EDTF (at Level 0)

time string MUST be composed according to one of three representations as illustrated in the following three examples:

2001-02-03T09:30:01 2004-01-01T10:10:10Z 2004-01-01T10:10:10+05:00

Note: 'T' separating date and time must be upper case.

The date/time string MUST use 8601 extended form, i.e. date with hyphen, time with colon. Zone-offset may be omitted or included. 8601 extended format time zone designation consists of either a 'Z' to indicate UTC, or a '+' or '-' to indicate "ahead of UTC" or "behind UTC", followed by a 2-digit hour, followed optionally by a colon and the 2-digit minutes.

The time part is also optional under EDTF level 0 (as, of course, for ISO 8601:2004).

Under EDTF level 0, therefore, a date like 20010203 is not permitted. Under ISO8601:2004 it is permitted (and is currently supported by Philip's implementation). That counts in favour of EDTF in my view.

However Under EDTF level 0, a space between the date and time is not permitted. That fatally sinks that standard in my view.

Recommendation: For the strict format us ISO8601:2004, not EDTF level 0.

Should there be a colloquial input format?

Recall the colloquial formatting: Entailing the Era suffixes and BCE Years that are nominally one less, or "more" depending on how you want to speak about it, (e.g. the "280" in "280 BCE") than their ISO equivalent (e.g. the "279" in "-0279").

  877 BCE
  124BC/122BC
  2004 CE
  343-02-03BC

My motivations in suggesting this input format were twofold:

  1. To supporting a colloquial output format where some dates in a list have the era prefix. E.g.

    1066
    0877
    0402 CE
    0382 BCE
  2. To preserve *.bib files as being readable by non (computer) technical folk. That is, to allow for something like date = {0380 BCE} rather than force everyone to recognize date = {-0379}

The second motivation is the chief reason for the recommendation. Under ISO8601:2004 dates greater than zero become immediately easy to interpret. If you weren't familiar with the standard, you could guess that 1505-05-10 probably means YYYY-MM-DD.

However, dates less than zero are unwieldy even when you are familiar with the standard. What does -0379 mean? You have to recall that Dionysius Exiguus, when creating the Anno Domini system in 0525, afforded no room for a zero year (it goes 2 BC, 1 BC, 1 AD, 2 AD). You therefore have to mentally subtract a year, take the absolute, in order to convert to the colloquial (and traditional) 0380 BCE. In 6 months time I'm likely to forget whether you add or take away from -0379 to make the conversion.

It is right that ISO8601/EDTF express years less than zero as negatives like -0379, and take into account a year zero. For that solves many, even if not all, sorting issues. But, alas, it is nevertheless jarring against the traditional BCE/BC dating scheme.

So supporting both the strict ISO8601/EDTF (-0379) and colloquial (0380 BCE or 0380 BC) is a good move, to allow for those folk that want a strict format and those folks that prefer something more human readable.

Recommendation: Keep the colloquial input format, in addition to the strict format; but have 4 digit years under the colloquial format (as previously covered).

How about a biber/biblatex option for outputing the era names below X?

Philip, your commented out code in 96-dates.tex for displaying the era name for dates less than 500 did not work. However, you thereby enabled me to modify it ...

\ifdateera{bce}{
 \ifnumcomp{\thefield{#1}}{<}{500}
  {\printdelim{dateeradelim}\bibstring{beforecommonera}}{}
}{
 % E.g. Is year between 1 and 500.
 \IsInRange{\thefield{#1}}{\printdelim{dateeradelim}\bibstring{commonera}}{}
}

... With the addition of utility code from Peter Grill, "xstring test for numbers within a range" http://tex.stackexchange.com/a/159865/105123

My result was as desired ...

Era format restricted to date range
labeldate = 1066
date = 1066
origdate = 877
eventdate = 402 CE
urldate = 383 BCE

So I see that this works well enough at the style author level. But I'd suggest this would be very handy as an option (in addition to datecirca, dateuncertain etc.).

Recommendation: Add "show era name less than X" functionality through an option. Either by:

Summary of Recommendations

Documentation

Recommendation: In the documentation include the conceptual scheme (or something like it) that distiguishes between: strict; colloquial and additional qualifying symbols.

Allow "+" sign?

Recommendation:

  • (Recommended option) If we want to support ISO8601:2004 we'd want to allow a plus sign "+" for positive years and zero; in addition to allowing positive years and zero without a sign.
  • If we want to support, rather, EDTF we need to disallow a plus sign "+" for positive years and zero.

How many digits for years?

Recommendation: All years, whether under the strict or colloquial input format, be 4 digits. (There'd be no problem supporting a year with less than 4 digits in an output format). Assuming, that is, that 3 digit years aren't already supported for positive years in production biblatex. If they are I'd suggest deprecating them (allow them through with a WARN message).

Allow time?

Recommendation: Support times, with dates. Support optional time zone information. (Ramification: Perhaps, in that case, we'd need a rule: if there is a time, it must come with a date).

Allow space between date and time?

Recommendation: Allow for space between date and time, as well as the character "T".

Which strict format?

Recommendation: For the strict format us ISO8601:2004, not EDTF level 0.

(But, again borrowing, rather than conforming to, EDTF is fine. It was a good idea to add support for ~, as Nick suggested).

Should there be a colloquial input format?

Recommendation: Keep the colloquial input format, in addition to the strict format; but have 4 digit years under the colloquial format (as previously covered).

How about a biber/biblatex option for outputing the era names below X?

Recommendation: Add "show era name less than X" functionality through an option. Either by:

  • Adding another option (e.g.) dateerashowbelow= positive integer | all. (default = 0); or
  • Adding values to the current option dateera = secularX, christianX (where x is: a positive integer; all; or missing). E.g. secular500 displays the era name below 500; secular and secular0 displays the era name below 0; christianall displays the era name for all dates.

Edit: added

in addition to allowing positive years and zero without a sign.

plk commented 8 years ago

The first thing to note is that sorting is not relevant here because the sorting system takes care of normalising dateparts for sorting. This means that the 3 vs 4 digit year issue is not really an issue.

Times - hmm. It is easy to support the input formats for this but the scaffolding internally to support times and time parts would be quite significant, akin to that needed for dates. I've had no request for this in years and no questions about this on TSE so I'm not really convinced that this is really needed by anyone. I haven't seen any style that requires time of access for URLs etc.

Also with the "era less than" thing - is there a style that actually needs this? The general policy is not to implement in core things which are fringe style requirements since this can be done easily in a style which requires it.

JohnLukeBentley commented 8 years ago

Thanks Philip.

Number of digits in a year and sorting

Biblatex does indeed handle sorting properly, in virtue of date parts, but, as you pointed out before, these dates import from, and export to, all sorts of programs. If an external program, some reference management software chiefly, treats a year value as a string then sorting years, some of which will be 3 digits, will come out as:

125 1300 200

But as I say, there is no possible sorting, even if all the years are expressed with 4 digits and are treated as strings, that properly orders dates which range over negative and positive years.

And software shouldn't be treating dates, or date parts, as strings.

So I agree that sorting is a less significant reason for enforcing 4 digit years.

The more important reasons are, firstly, that it aids looking at dates in a column. An example to hand is I'm using Zotero Reference Management Software with the addin "Zotero Better Bib(La)Tex" in order to export custom fields to biblatex. The addin stuffs fields as a delimited string in Zotero's native "Extra" field. In that field I might have these three separate entries:

biblatex[origdate=0125,shorthand=MoE] biblatex[origdate=0200] biblatex[origdate=1300]

And when I sort on my entries by the "Extra" column, visually scanning down that column is aided by having those year values line up vertically.

Secondly, and most importantly, are the ambiguities in interpretation that I mentioned.

For my own purposes a lack of enforcement of 4 digit years in biblatex is less of a problem. Since you allow 4 digit years (although I've yet to test something like "0025") , I can just enforce this workflow for myself.

But I make my argument trying to think of future users who might inadvertently create problem for themselves that could otherwise be prevented by enforcing this discipline on them from the start.

Times

In terms of the need, I'll make three arguments.

Indulge me pressing the prior argument first up. The argument is that the need may become increasing, as internet publishing matures. Specifically there'll be some kinds of documents that will be rapidly changing and it becomes important when citing such documents to identify the right version. Think of a newspaper article that might get updated 3 or 4 times over several days as new facts come to light. You might wonder why an author doesn't take into account some particular fact in the article. If you follow their citation, with a datetime stamp, you'll be enabled to see they were referencing an earlier version.

In addition to the utility for version monitoring the one document sometimes it's significant compare datetimes between documents. I'll use Zotero and newspapers again as an example. If I surf to the newspaper article http://www.abc.net.au/news/2016-06-03/weather-warning-issued-for-nsw/7473294 and use the Zotero browser plugin to "Save to Zotero using embedded metadata" then the Date is captured in Zotero with a datetime string of 2016-06-03T07:28:33+1000.

I'm not sure which of the webpage metadate fields it comes from but it could be coming from a field like ...

<meta name="DC.date" scheme="DCTERMS.W3CDTF" content="2016-06-03T07:28:33+1000"/>

But all that time info will be lost when exporting to biblatex (unimportant for a local weather event but potentially significant for an international event like an unfolding war, toppling of a leader, etc. when checking the up-to-dateness of different news sources).

Thirdly, that the request for this has been rare might more a product of people being used to being unable to exercise a power (storing times with their citation entries) and therefore dropping the desire for it. If you enable the power folk might come to learn how it is useful. Quite a general phenomenon I'd suggest.

But yes, I suspected it might be quite a large body of effort to implement times. For that reason alone how about we drop it as a present requirement? I could then raise it in another ticket, at some future time. It's personally less important than other features of date support (most of which you've already implemented).

Era less than, as a biblatex option.

I'm not very well versed in style guides. I've only recently subscribed to Chicago and there doesn't appear to be any specification with respect to this.

My specific needs comes from the context of publishing Philosophy academic papers. In this context, frequently enough, the history of an issue will be traversed from ancients to contemporaries. In the case you'll end up with a set of dates like:

1066 0877 0402 CE 0382 BCE

Indulge me repeating a previous argument. In this context, for positive or negative dates close to zero that are expressed in era terms, when an author cites "0382 BCE" the date is unambiguous. When an author cites "0402", without the era term, as a reader you wonder whether the author truly means it to be a positive ("CE") year, or it was a negative ("BCE") year and they accidentally omitted "BCE". If the author writes "0402 CE" then the reader can be more confident the author intends it.

As positive dates move further away from zero a reader can be more confident the year with out an era suffix, is as intended. The context of the paper is much more likely to make dates cited as 1066, 1266, 1786, 2012, etc, unambiguously positive ("CE") years.

Fringe need or not, it's a feature, era-less-than as a biblatex option, that I'd personally find very useful. It would make me personally very happy if you implemented it (and it was not too much work on your side). That would help keep my documents (or custom styles) relatively lightweight.

Alternatively, we could settle on making the style-level example in 96-dates.tex clean and robust.

In any case, all that you've so far done is awesome.

njbart commented 8 years ago

I haven't seen any style that requires time of access for URLs etc.

Well, the Chicago Manual of Style, 16e, 14.246 “Citations of blog entries”, does list one example:

  1. AC, July 1, 2008 (10:18 a.m.), comment on Rhian Ellis, “Squatters’ Rights,” Ward Six (blog), June 30, 2008, http://wardsix.blogspot.com/2008/06/squatters-rights.html.

I’d agree there seems to be no widespread demand for including time yet (not on the Zotero and CSL forums either), but at the very least it would be useful, for now, if biblatex could parse date fields that include a time without complaining.

simifilm commented 8 years ago

Just to make sure – is it correct that the cite commands in the standard styles do not (yet) take account of this?

See this MWE

`% % This file demonstrates various date formats and tests which apply to them % for output % \documentclass[a4paper]{article} \usepackage{fontspec} \usepackage[american]{babel} \usepackage{csquotes} \usepackage{filecontents} \begin{filecontents}{\jobname.bib} @book{buch, author= {Wurm, Tom}, title = {Das Buch}, date = {1982/1988}, location = {Die Stadt}, publisher = {Der Verlag}} @book{buch2, author= {Wurm, Tom}, title = {Das Tuch}, date = {-1992/-1988}, location = {Die Stadt}, publisher = {Der Verlag}} \end{filecontents} \usepackage[style=authoryear,% alldates=short,% dateera=secular,% dateuncertain=true,% datecirca=true,% backend=biber]{biblatex} \addbibresource{\jobname.bib} \begin{document} \cite{buch,buch2}

\printbibliography \end{document} `

The negative dates are printed in the bibliography but are ignored in the in-text citations.

plk commented 8 years ago

That's true, I haven't looked at citations yet.

simifilm commented 8 years ago

I'm probably missing something here, but AFAICS it is not possible to output negative values as such, i.e. print origdate = {-331} simply as -331. And is it really intended that negative values are printed as positiv when dateera is not set?

plk commented 8 years ago

The style can format negative dates however it wants but the default styles will use the dateera information with the relevant bibstring. The other question is tricky. dateera just adds era information to the .bbl and then styles can use it and so any style which handles negative dates should always use dateera and do something sensible on output I suppose.

JohnLukeBentley commented 8 years ago

Allowing for the output of negative years as negative years, origdate = {-331} simply as -331 (or as I've been suggesting -0331), I think ought be allowable in default styles through options. That'd otherwise break, in the default styles, whatever "strict" format (ISO8601:2004 V EDTF 1.0) was chosen.

I hadn't previously mentioned this issue, with some others, in virtue of the state of flux of the broad target date format standards: ISO86001; EDTF 1.0; "strict"; "colloquial". For example, if EDTF 1.0 was to be chosen as the strict format then settings like alldates=edtf1 might need to be promoted over a deprecated alldates=iso8601.

So sorting that out those overarching standards issues first would ease Philips workload when it came to thinking about how to implement options.

I have also been holding off commenting because I wanted to get do some testing around @nickbart1980's last issue: what happens when a date field with a datetime value is parsed (but haven't yet got around to this).

Nick, it would be great to have your thoughts on each of my Recommendations (e.g. as listed under "Summary of Recommendations"). Even if, for example, that'll entail a reassertion of your promotion of EDTF.

Philip may have already been privately persuaded on the issues I raised, for or against individual recommendations. Indeed Philip could be part way through implementing some of them. But it would be valuable to hear some other voices weigh in on this, especially any contrary voices. I have been supposing that Philip was holding off on further work, in order to afford more time for you, Nick, to make some further comment.

So, simifilm, it is good of you to raise this detail, of formatting negative years. It helps ensure it's on the list as an issue. But there are many ways in which the current, under development, implementation can be made to break. And so sorting out what we want needs to have priority over how the current implementation happens to fail. So if you have any particular views on what ought be, as you play with the current development implementation or review this thread, that'd be great.

I hope none of what I've written creates any sense of being rushed in anyone. There's nothing wrong with this project ticking over slowly (and necessarily and finally at Philip's discretion).

simifilm commented 8 years ago

I just spent quite some time till I understood that the \ifdateera tests are only available (or rather the data they need) when the dateera option is set. I am not really sure whether this is a good choice. In my understanding, dateera defines how negative values will be typeset and not whether this information is actually available. I also think that the manual is at least ambiguous here.

plk commented 8 years ago

I think you are right, I will change this.

plk commented 8 years ago

Can you try 3.5/2.6 now? I have added code to obey the date meta-data information for citations which print dates in the standard styles (authoryear styles). ISO8601 output also uses a prefix "-" for negative years instead of a "BC/BCE" postfix as this is not ISO8601.

plk commented 8 years ago

By the way @JohnLukeBentley - no point trying times yet - they are not parsed at all at present. It's on the list.

simifilm commented 8 years ago

Hmm, I don't get this to work. I guess something is wrong about the following MWE:

\documentclass[a4paper]{article} \usepackage{fontspec} \usepackage[american]{babel} \usepackage{csquotes} \usepackage{filecontents} \begin{filecontents}{\jobname.bib} @book{buch, author= {Wurm, Tom}, title = {Das Buch}, date = {-1988}, location = {Die Stadt}, publisher = {Der Verlag}} \end{filecontents} \usepackage[style=authoryear,% date=iso8601, backend=biber]{biblatex} \addbibresource{\jobname.bib} \begin{document} \cite{buch} \printbibliography \end{document}

JohnLukeBentley commented 8 years ago

I downloaded the latest versions.

Philip am I right that you are wanting to wait for Nick (or any others) to make further comment before deciding on my prior recommendations?

A request on a trivial matter: in 96-dates.tex throw in a sorting=none biblatex option (so output items are listed in the order as they are in the source - this would help when I add my own test cases to 96-dates.tex and compare it against your 96-dates.pdf).

Edit: "thrown" to "throw".

plk commented 8 years ago

Ok, I'll do that for the test file. I am currently looking at some other changes to do with name parsing but will get this afterwards.

njbart commented 8 years ago

Nick, it would be great to have your thoughts on each of my Recommendations […]

In brief: as an input format, use EDTF 1.0 level 0 and 1; nothing else – I’ve become even more skeptical of “colloquial” extensions than I used to be.

My case for recommending using EDTF exclusively:

Should there be a colloquial input format?

I can see why some might find it useful; still, after careful consideration, I really don’t think so. One thing I admired about biblatex right from its start was the simplicity of its date format: one clearly defined subset of ISO 8601, and nothing else. Allowing “colloquial” input formats would only water down biblatex’s clarity and elegance. Also, if we allow “colloquial” formats here, we’d also have to accept other “colloquial” formats like “23 Apr 2016”, “23/04/2016” and many others. I’d be strongly opposed to any of this. biblatex’s date format has served us very well for many years, and if we extend it, we should go for no less than an equally clearly defined and widely accepted standard.

the problem with biblatex is always that people are using more and more various methods for automating collection of bibliographic data and we have no control over the format and have to be quite flexible in what we accept

I see parsing “colloquial” input formats as a job for frontends, not biber/biblatex itself. What’s more, most (meta-)data sources like journal publishers do not offer biblatex data anyway, it’s bibtex at best, and if anyone wants to convince publishers to start providing biblatex data at all, we’d better insist on a very clearly standardised format such as EDTF.

[…] thoughts on each of my Recommendations […]

In more detail:

plk commented 8 years ago

I'm veering towards @nickbart1980 here. Since we currently have a very restrictive date format, the time to consolidate this is now because after we throw open the doors to colloquial formats, it can never be closed and maintaining backwards compat is not amusing ...

plk commented 8 years ago

@simifilm - You'd need the options:

\usepackage[style=authoryear,%
datelabel=iso8601,%
dateera=secular,%
backend=biber]{biblatex}

but there was also a problem which meant this wasn't working for citations - please get the latest 3.5/2.6.

plk commented 8 years ago

I plan to add time support and implement EDTF dates as per @nickbart1980's comments. An important consideration here is biblatex acceptance by journals which is becoming more of an issue as the user-base increases. I will not implement 5.2.2 as that is of marginal use I think - it specifies information to "completed later" which does not make much sense I think.

JohnLukeBentley commented 8 years ago

Nick, excellent comment. That was the kind of retort I was hoping for.

Philip, if you could hold off on doing work, I'd like a bit of time to respond. There is a potential for these decisions to have large ramifications, so it would be worth taking the time to think it through (via discussion), I'd suggest.

plk commented 8 years ago

This can all be addressed in two levels I think. Normalisation to EDTF can be handled, as @nickbart1980 suggests, in earlier stage processes, leaving the biblatex interface and internals relatively clean. Luckily, we have such a pre-processing stage already with the sourcemapping feature which will easily handle normalising into EDTF form with no changes to the source data for users. This, I think, should satisfy everyone.

simifilm commented 8 years ago

Slowly getting, but there is still a glitch with negative dates in citations: authoryear currently prints a minus sign before (orthe respective negative era description after) the year.

plk commented 8 years ago

It should only print a minus if iso8601 output is requested, description afterwards otherwise (iso8601 doesn't have things like 345BCE, only -345) etc.

simifilm commented 8 years ago

Sorry if I was unclear. The problem is that the \ifdateera test seems to fail. It always prints a negative (or a BCE), even if it as AD date.

plk commented 8 years ago

It works for me, I will upload the latest version shortly.

njbart commented 8 years ago

This can all be addressed in two levels I think. Normalisation to EDTF can be handled, as @nickbart1980 suggests, in earlier stage processes, leaving the biblatex interface and internals relatively clean. Luckily, we have such a pre-processing stage already with the sourcemapping feature which will easily handle normalising into EDTF form with no changes to the source data for users. This, I think, should satisfy everyone.

I agree, excellent plan.

simifilm commented 8 years ago

@plk There's definitely something wrong. Have a look at the following MWE:

\documentclass[a4paper]{article}
\usepackage{fontspec}
\usepackage[american]{babel}
\usepackage{csquotes}
\usepackage{filecontents}
\begin{filecontents}{\jobname.bib}
@book{buch,
author= {Wurm, Tom},
title = {Das Buch},
date = {-1988},
location = {Die Stadt},
publisher = {Der Verlag}}
@book{buch2,
author= {Panther, Paul},
title = {Das Much},
date = {2012},
location = {Die Stadt},
publisher = {Der Verlag}}
\end{filecontents}
\usepackage[style=authoryear,%
            datelabel=iso8601,
            dateuncertain=true,%
            datecirca=true,%
            backend=biber]{biblatex}
\addbibresource{\jobname.bib}
\begin{document}
\cite{buch,buch2}
\printbibliography
\end{document}

This gives me negative years in both in-text citations.

maieul commented 8 years ago

Will it allow largeur date like "13th century"?

plk commented 8 years ago

Probably not - it's a nightmare parsing and tracking such formats. Of course you could write a sourcemap to convert it to EDTF 1200/1300 or 1200~ etc.

maieul commented 8 years ago

ok. thanks.

plk commented 8 years ago

@simifilm - it's a scope issue somewhere, with separate cites, it's ok. Looking into it.

JohnLukeBentley commented 8 years ago

Intro

Philip:

An important consideration here is biblatex acceptance by journals which is becoming more of an issue as the user-base increases.

This is a shared premise. I very much have that sort of thing in mind as I think about possible alternatives and make recommendations.

Nick, you pointed to EDTF: being adopted by the library of congress; being adopted by various universities; and having several tool implementations. That does help show it to carry more weight that I would have otherwise thought. It's draft, but not so draft that it's merely a spec without any adoption. Those facts do count in its favour.

Nick:

I see parsing “colloquial” input formats as a job for frontends, not biber/biblatex itself.

That might go to the heart of a difference in perespectives. That is, we may have a difference of views about the roles we see biblatex input format serving (in general, not only to do with datetimes); and, significantly, could be serving - in the future.

That is, you are, perhaps, seeing biblatex merely in it's current role as a back-end/intermediary format; while I'm thinking about it in a future role as a front-end format.

Biblatex's current role

Currently, and as far as I know, most will be using biblatex input formats as a back-end/intermediary process in their workflow. Namely, most users of latex, who are interested in generating PDFs and are using biblatex, will:

When biblatex input formats are used in this back-end/intermediary manner it becomes less important that the format be human readable. Because the only humans needing to read it are software developers (like ourselves) who can understand and take technical formats in their stride (like datetime = {2016-06-06T07:27:49Z} or title = {The {{Chicago Manual}} of {{Style}}}) .

And if end users don't need to read the back-end/intermediary format then having a stricter input format makes sense for may reasons.

That is not to say that human readability plays no role when only devs are looking at a format. Human readable code helps developers too. Only that, I'd concede, input format human readability is less important, and stricter formats are more import, when end users never look at a back-end/intermediary format.

I imagine we (all) probably agree, more or less, on all those matters.

Biblatex's possible future role

The current problem with writing and publishing

So let me tell you about what's brought me into this obscure (but excellent) part of the writing/publishing toolset ecosystem (biblatex) ...

An exciting workflow is emerging to make the life of writers and publishers better. Especially as they collobarte.

Currently, and generally, (academic) writers use either: some WYSIWYG thing, like MS Word; or Tex/Latex. Publishers (e.g. journals) may even insist, when receiving submissions, on one source document type or the other.

But there are major problems with either source document type.

If a writer uses a WYSIWYG, like Word, they are liable to use any old styling. They might use no style (as when hard formating in lieu of using Word's inbuilt styles). Even when they use Word styles they may do so arbitrarily and inconsistently (e.g. using a "heading 3" style for their first level headers).

Moreover, even if a publisher could, magically, get their writers to use Word Styles in a consistent manner: parsing MS Word documents at the publisher's end is going to be a nightmare.

These considerations are well understood by latex users, as anyone reading this thread will be. The major motivation for using latex, by contrast, is to force on oneself (if one is a writer), or others (if one is a publisher), the discipline of seperating semantics from presentation.

By marking up a document using tex one works on the semantics and leaves fiddling with the presentation as a seperate body of work. Writers, indeed, need not worry about the presentation if they are handing off tex files to a publisher.

But writing in latex also has problems. It's very heavy in markup terms. And there's a great deal to learn in order to work in tex/latex. Non computer/non mathematics academics are generally going to have a hard time with it and be turned off by it. Even if you are tex competent, writing your magnum opus is like writing in syrup. With tex/latex, during the writing task, one has dropped the fiddling with presentation only to replace it with the fiddling of markup codes. Writing directly in html is similar.

There is emerging, however, an alternative workflow and toolset to address that problem: the use of lightweight markup.

The solution: lightweight markup

The goal is: with the right lightweight markup language and toolset a relatively plain textish looking source document can be transformed into one or more of the main output formats: html, pdf, epub, and mobi (for the Amazon Kindle).

(In that scenario latex still can have a role to play, I think should have a role to play, as part of the pipeline in tranforming the lightweight markup into pdf.)

That has advantages ...

(As you can see, I'm not at all against the imposition of restricted formatting on end-users - there are ways in which that imposition makes live easier, once end-users have been exposed to the discipline).

As far as I know the two main lightweight markup languages and toolsets that are being developed for this purpose are: markdown (multimarkdown looks like the most promising flavours here); and asciidoc.

I'm in the process of evaluating these two. Neither are production ready for all four formats but there is ongoing development. I have an interest as a writer, publisher (of sorts), and reader.

As part of the evaluation I'm thinking about whether I'll join the development effort; fork existing code; or begin coding my own parser and toolset from scratch. Either way, for a lightweight language and toolset to be succesful for the mentioned goal it'll need to handle bibliographies and citations.

But the idea would be to allow a writer to handle their bibliographies and citations in keeping with the lightweight source document syntax and philosophy.

Recall Gruber's philosophy for (original) markdown:

The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions.

https://daringfireball.net/projects/markdown/

Lightweight markup plus biblatex

It is in this context that I've been thinking on using biblatex as an extension to a lightweight markup language, I'm favouring markdown, for the purposes of handling bibliographies and citation.

And it's quite possible the thing to do would be to use the biblatex syntax as is. For example, to end up with something like (using markdown + biblatex) ...

My Magnum Opus
==============

Donec ultrices
--------------

Lorem ipsum dolor sit amet consectetuer Donec ultrices Lorem Vestibulum massa.
At nibh tincidunt pede libero malesuada elit eu pellentesque hendrerit tellus.
Id id id vitae leo Lorem lorem sed tincidunt nibh urna. Fusce sed eget eu id
quis convallis In velit ac urna. Ut pellentesque id tempor cursus pretium
Phasellus arcu montes dui nisl \autocite[see, e.g.,
][135]{russell_1914_nature}. Diam Sed Morbi Vestibulum.

> Nunc et et adipiscing eu et euismod ipsum Sed pellentesque Vivamus. Natoque
Morbi neque Nam fermentum arcu pretium interdum amet quis consectetuer.
Curabitur augue mattis ipsum sagittis adipiscing Nam pede lorem sagittis orci.
Quis et dui fringilla justo sem eu et eu Nullam Pellentesque. Phasellus dui
leo ac facilisi aliquam neque lacus sapien. \autocite{foot_1997_virtues}

Natoque Morbi
-------------

Justo sed orci Nulla dolor lobortis et consequat ut ut est. Metus Nullam orci
Sed sodales Suspendisse odio id enim ante Nullam. Mollis tortor tincidunt
gravida orci velit rutrum vel euismod congue mollis. Vel turpis faucibus
sociis justo In et Maecenas massa Duis auctor. Vel nulla vestibulum Sed elit
Vestibulum Maecenas id sollicitudin urna et. Mauris tellus dui convallis quis.

References
----------

@incollection{foot_1997_virtues,
  author = {Foot, Philippa},
  date = {1997},
  title = {Virtues and vices},
  pages = {163--177},
  editor = {Statman, Daniel},
  publisher = {{Georgetown University Press}},
  booktitle = {Noûs},
  url = {http://example.org/VirtureViceChap1.pdf},
  volume = {17}
}

@article{russell_1914_nature,
  author = {Russell, Bertrand},
  date = {1914},
  shorthand = {NA},
  title = {On the {{Nature}} of {{Acquaintance}}},
  pages = {1--16,161,435--453},
  journaltitle = {Monist},
...

There's the potential to make citations in the source document more lightweight by using them as they'd normally be output, e.g. ...

dui nisl (see, e.g., Russell 1914, 135). Diam Sed Morbi Vestibulum.

... and have them, behind the scences, map to the right biblatex commands, e.g. \autocite[see, e.g.,][135]{russell_1914_nature}., to aid tranformations down the pipeline into chosen outputs (e.g. as when choosing a footnote output style even though the input style is in author-date).

But, at this stage of swirling design ideas, the biblatex entries themselves seem ripe for being left alone in their native format. That is, as ...

@incollection{foot_1997_virtues,
  author = {Foot, Philippa},
  date = {1997},
  title = {Virtues and vices},
  ...

Conclusion: weight biblatex human readability

It is in this context that biblatex's current human readability has impressed me.

Even if such a format does not quite meet Gruber's standard of "without looking like it’s been marked up", it nevertheless meets his standard of being as "readable as possible" and "publishable as-is, as plain text".

The fieldname = value layout arguably makes scanning for the relevant data even more readable than in a traditional Style Guide (APA, Chicago, etc) bibliographic format, where the fieldnames are implied by the ordering.

That biblatex is human readible enables it as an excellent choice as the bibliographic format for use in a lightweight markup language and toolset. A markdown + biblatex plain text file could be sent to anyone who has no idea what markdown and biblatex are. They'd still be able to read and understand it.

If this idea is sound it has the potential to drive a significant increase the biblatex user base and make journals even more keen to adopt biblatex.

However, that would be to use biblatex as a front-end format.

This is why it might be a good idea to evaluate candidate biblatex datetime input formats (expressing eras, approximations, uncertainties, adding times, etc) with a view to keeping biblatex human readable. Or, at least, keeping human readability as a weighty criteria.

I haven't yet specially responded the two issues, in the light Nick's last post, that I hope are still up for debate:

I'll address those in a subsequent post. I just thought I'd, first, provide the reasons for weighting human readability in biblatex.

So nobody need respond to the above, but they'd be welcome to if they wished.

Edit 01: Changed syntax highlighter from markdown to latex, in one of the code blocks. Edit 02: Added "That is, you are, perhaps, seeing biblatex merely in it's current role as a back-end/intermediary format; while I'm thinking about it in a future role as a front-end format."

njbart commented 8 years ago

Lightweight markup plus biblatex

Try pandoc, which does almost exactly what you’ve been describing.

JohnLukeBentley commented 8 years ago

On a quick glance: It appears so. Thanks Nick!

It appears more mature than asciidoc or (native) multimarkdown. And, importantly, its citation handling appears mature; and it explicitly handles biblatex (at least as a file external to the main document). It accommodates various flavours of markdown. It looks great.

It looks like it doesn't handle embedded biblatex, it uses YAML instead. But that's incidental.

That'll go to the top of my list as the toolset to evaluate.

My further post on the biblatex date issues pending ...