plk / biblatex

biblatex is a sophisticated bibliography system for LaTeX users. It has considerably more features than traditional bibtex and supports UTF-8
511 stars 117 forks source link

Circa dates, circa date ranges, and question marked dates. Plus Eras. #427

Closed JohnLukeBentley closed 8 years ago

JohnLukeBentley commented 8 years ago

I request support for:

It would be desirable for these kinds of values to be permissible in all date fields. I use origdate as the more likely example. There might be reasons for choosing different symbols for these kind of dates, to make parsing easier. There might be reasons for disallowing spaces.

edit: This issue thread also combines issue Before the common era (BCE/BC) and common era (CE/AD) date support. #422 /edit

Kinds

When writing a scholarly piece there are several kinds of date ambiguities and uncertainties, these are listed below.

(Part of the power of Biblatex is in providing support for many and any type of style guide. But in the cases below I sometimes borrow from the ...

University of Chicago. 2010. The Chicago Manual of Style. 16th ed. Chicago: University of Chicago. http://www.chicagomanualofstyle.org/16/contents.html.

... because it sheds some light on these issues. I could have chosen another style guide.)

  1. Circa dates. Where the scholarship is only able to fix a date approximately. Generally this is beyond the precision of a year, as in "c. 125 CE" (rather than at the precision of a day: we rarely see, in publishing, something like "c. 0125-02-20").

    ca. or c.

    circa, about, approximately (ca. preferred for greater clarity)

    (University of Chicago 2010, under "10.43 Scholarly abbreviations", http://www.chicagomanualofstyle.org/16/ch10/ch10_sec043.html)

    // These are my examples (derived but not quoted from (University of Chicago 2010))
    (Epictetus ca. 125 CE)
    (Epictetus c. 125 CE)
  2. Circa date ranges.

    Citation: (da Vinci c. 1487–1490)
    
    Reference Entry: 
    Da Vinci, Leonardo. c. 1487–1490. Codex Trivulzianus.
  3. No dates.

    "When the publication date of a printed work cannot be ascertained, the abbreviation n.d."

    Boston, n.d.

    (University of Chicago 2010, under "14.152 'No date'", http://www.chicagomanualofstyle.org/16/ch14/ch14_sec152.html)

  4. Question marked dates.

    A guessed-at date may either be substituted (in brackets) or added. Edinburgh, [1750?] or Edinburgh, n.d., ca. 1750

    (University of Chicago 2010, under "14.152 'No date'", http://www.chicagomanualofstyle.org/16/ch14/ch14_sec152.html)

  5. Ambiguities over which known date is relevant. For example we might have the following reference entry:

    Hume, David. 1751. “An Enquiry Concerning the Principles of Morals.” In Enquiries Concerning Human Understanding and Concerning the Principles of Morals, 3rd ed., edited by L. A. Selby-Bigge and P. H. Nidditch. New York: Oxford University Press, 1975-06-12. isbn: 978-0-19-824536-0.

    ... but there is ambiguity around whether 1751 is the relevant original date, given ...

    Selby-Bigge and Nidditch’s 1975 edition is based off a collection of Hume’s essays posthumously published in 1777.

    We could have chosen 1777 as the original date. But, in the case, we have reasons for choosing one date (1751) over another (1777). So in the reference entry we can add an annotation that explains all this. The annotation thereby handles the ambiguity ...

    Selby-Bigge and Nidditch’s 1975 edition is based off a collection of Hume’s essays posthumously published in 1777. However, Hume’s “An Enquiry Concerning the Principles of Morals” was first published in 1751, and that's the date we use here.

    Feature requests for Biblatex

Bilatex already handles:

Biblatex also provides for date ranges.

So I request the additional functionality for:

Often enough there might be no semantic difference between a circa date and a question marked date. Both can be used to express a uncertainty about a date. It's rare to come across a questioned marked date, relative to a circa date. So it might be tempting to ignore question marked dates with the rule: "If you are uncertain about a date just designate it as a circa date".

However, there probably are going to be contexts, albeit rare, in which an author does want to maintain a semantic difference. E.g. For dates they are personally uncertain about, the author might tag with a question mark. For a date that the author knows a community of scholars has established as having a lack of precision, the author might tag with a circa.

On the issue of whether to support output formats "ca." or "c." as the abbreviation for circa, I'm undecided. Personally I can generally mostly recall seeing c. 1815 and probably prefer this look, but the Chicago Manual of Style promotes 'ca.'. Perhaps both possibilities need to be supported for style authors who, in turn, provide options for users (with relevant defaults for a chosen style).

plk commented 8 years ago

Time support is now implemented and with this, everything is now done in dev. 3.5/2.6 binaries and packages are uploaded. Time support is fairly full-featured and supports localisation and three default formats "HH" - 24-hour, "HHcomp", 24-hour with compressed ranges and "hh", a 12-hour format with localised "AM/PM". All time separators are localised and customisable and there are options for suppressing timezones, seconds and leading zeros. No default styles print any time information but see the PDF doc and the 96-dates.tex example file. Please report bugs here.

njbart commented 8 years ago

Great!

It seems there are two typos in https://github.com/plk/biblatex/blob/dev/doc/latex/biblatex/biblatex.tex#L1678:

~\textasciitilde

ifdateircaifdatecirca

EDIT:

(Also, why “c.” in one row and “circa.” in another? And “circa.” should not be followed by a dot anyway.)

Also L1675

0343-02-03   343-02-03 BCE

seems incorrect; should be either

-0343-02-03   344-02-03 BCE

or

0343-02-03   343-02-03 CE

plk commented 8 years ago

All fixed, thanks.

njbart commented 8 years ago

One more thing: isn’t a ? missing in L1678:

1723\textasciitilde & circa 1723? & using \cmd{ifdateuncertain} and \cmd{ifdatecirca} tests\\

1723?\textasciitilde & circa 1723? & using \cmd{ifdateuncertain} and \cmd{ifdatecirca} tests\\

?

plk commented 8 years ago

Should be fixed.

JohnLukeBentley commented 8 years ago

Using the version after @plk's post beginning "Time support is now implemented and with this..."

"You" below is addressed to @plk, but I mean everyone to feel welcome to weigh in on anything.

biblatex.pdf corrections

Date Specification Output Format, in doc Output format, expected Date Specification, expected
0343-02-03 343-02-03 BCE 343-02-03 CE [using dateeraauto set to ‘1000’ and commonera localisation string; previously mentioned by nick and corrected]
1723? circa. 1723? 1723~? [previously mentioned by nick and corrected]
2004-22 2004 2004 summer [or some other summer string]
2004-24 2004 2004 winter [or some other winter string]

Bugs

Documenation and Code change suggestions

Issues

96-dates.tex suggestions

Great Stuff

simifilm commented 8 years ago

A question which has probably more to do with citation standards than with the technical issues discussed here: Let's say I have a text which was published in the second century AD. How would I write this in a bibliography and can I do it with biblatex? Something like {date=0150~} does not seem right to me.

plk commented 8 years ago

Use the "unspecfied" format 01uu and then format accordingly - see the example in 96-dates.tex.

plk commented 8 years ago

Hopefully all bugs are fixed now and the 96-dates.tex test file has more visibility for switching between options to see the differences.

JohnLukeBentley commented 8 years ago

Thanks! I'll download and test.

simifilm commented 8 years ago

There currently is bug with BCE dates. If you look at the first example in 96-dates.tex, date = {-0477}, is printed as 478 BCE. For some reason, the year has been increased (or rather decreased) by 1.

njbart commented 8 years ago

This isn’t a bug. In ISO 8601, EDTF and astronomical year numbering, there is a year zero, and the year “1 BCE” is numbered “0”, the year “2 BCE” is numbered “−1”, and any year “n BCE” is numbered “−(n − 1)”.

If you want “477 BCE”, you have to use date = {-476}.

Maybe a brief explanation should be included in the manual.

simifilm commented 8 years ago

@nickbart1980 I am still trying to get my head around this. So date = {-0476} is, depending on the dateera setting, either printed as "-476" or as "477 BCE". While this may be correct according to the standard, I find this highly confusing.

simifilm commented 8 years ago

Or in other words, -0476 according to EDTF does not mean what most people probably would think it means. That's really strange.

njbart commented 8 years ago

The trick is not to mentally equate the minus sign with “BCE”.

JohnLukeBentley commented 8 years ago

Simon, you might be confused (and your confusion has at least some chance of being representative, as you suggest), in two ways:

  1. Having to adjust the year ordinal, by 1, when converting from a negative astronomical/EDTF/ISO8601:2004(extended) "Calendar era [dating system]" (https://en.wikipedia.org/wiki/Calendar_era) to a colloquial calendar era, either "secular" or "christian".
  2. The relationship between biblatex <datetype>date and dateera options, and the calendar eras they express.

Adjusting years

A large discussion about this has already taken place in this thread. E.g.

... and responding posts from Nick and Philip.

In lieu of trawling through those posts the summary is:

As an input format it's a really good idea to support a modern, computer system enabling, "Calendar era [dating system]" that includes a year zero. Our mathematics generally includes a year zero, as in ... -2, -1, 0, 1, 2. This makes, for example, caculating differences between years straightward. This is probably one of the reasons why the modern "Calendar era dating systems", astronomical, ISO8601:2004 (extended by agreement), and EDTF include a year zero.

Philip, Nick, and I haven't explicitly talked about that point, but I think the three of us have had this sort of thing in the back of our minds.

We have explicilty arrived at a consensus view that EDTF (to level 1) ought be supported as the strict, modern, input format.

The colloquial/traditional calender era dating systems, now referenced in blbiatex as "secular" or "christian", inherit an unfortunate legacy from Dionysius Exiguus who, when creating the Anno Domini system in 0525, afforded no room for a zero year (2 BC, 1 BC, 1 AD, 2 AD).

Because we want to support both strict (EDTF) and colloquial output formats there will necessarily be adjustments havign to be made to negative years when performing a conversion from the strict to the colloquial.

All that is a consensus view between the three of us.

So as a person in the world, quite apart from being a user of biblatex, to some extent these sort of tricky conversions are unavoidable if you want to swtich between the strict and the colloquial system. Or at least this will become increasingly unavoidable as people start to communicate using a strict system, and reference years as negatives.

The strangeness ultimately derives from Exiguus' decision. And when walking down the street you ought spit and curse his name.

Where there has been disagreement is over whether the biblatex input format should support the colloquial system in addition to the strict EDFT system. E.g. To allow a user the option of inputing origdate={0380 BCE} or origdate{-0379}, at the user's discretion (to be parsed internally into one edtf date format before output options and styles are applied).

I was for this and Nick and Philip were against it. For the arguments for and against you'd need to revisit the linked posts for details. In the end I think it fair to say that Nick and Philip are impressed by the simplicity and elegance of having as lean an input format as possible.

Although I still maintain my view I do think their view has weight. There is a part of me that is happy that the view that the whole of me maintains, has been defeated.

Biblatex calendar era options

Currently, as Philip has it, the output calendar eras is deterimined by a combination of

date=year, short, long, terse, comp, edtf (default: comp)
[with the same options for <datetype>date (e.g. orgidate) and alldates].

dateera=astronomical (formerly "simple"), secular, christian

The issue of how the biblatex options ought work to express output formats is (mostly) independent of the previous issue. E.g. Adopting my prior proposal doesn't make the current issue disappear.

Although there is scope for a redesign of these Philip probably has it right.

When thinking about your desired output format you need to think, first, of which of the two categories of "calendar era date systems" you want: modern (including a year zero); or traditional (excluding a year zero). If you want a traditional format you then choose beteen the "secular" and "christian" systems. That determines your dateera setting, which should be the first option you set.

Then you move on to the date option (or <datetype>date; or alldates) and refine the look. However, if you select the edtf value this overrides dateera (and datezeros), concpetually setting dateera=astronomical

Maybe something like ...

dateera=modastronomical, tradsecular, tradchristian 

(I lament latex doesn't have a camelCase or PascalCase option value convention).

... would help in this regard.

simifilm commented 8 years ago

I'm afraid, I don't have much of use to add. I completely understand why it is desirable to use a standardized format for things like dates, and I am not arguing against this. And, of course, I have no idea how many people will actually make use of these features. To be honest, there probably aren't that many fields where you quote sources with exact BCE dates, so the whole discussion might be moot (in my case, I have a few titles in my database which were originally published BCE – Aristotle, Plato and the like – but none of them have a precise year of publication). But my prediction is that the majority of people who use this feature will make the same mistake I did and equate "-" with "BCE". But I don't have a good solution for this, since I see the problem of colloquial formats.

plk commented 8 years ago

For some reason, I can’t see this comment in Github web version or app … That is strange about the versions - CTAN is supposed to be on version 3.4 and I haven’t uploaded 3.5 at all there. You can get biber 2.6 binary from Sourceforge without having to build from git (OSX, Windows and Linux only until the official release).

PK

Dr Philip Kime

njbart commented 8 years ago

Ok, got it sorted, thank you.

simifilm commented 8 years ago

I see that \bibdateeraprefix is set to \textendash in german.lbx. This is almost certainly wrong; it must be a hyphen.

plk commented 8 years ago

Ok, It was a rather global replace. Fixed in DEV.

njbart commented 8 years ago

This is almost certainly wrong; it must be a hyphen.

I disagree. – As I said in #447, ISO 8601 clearly distinguishes minus signs from hyphens, so a negative date has to be composed of “minus YYYY hyphen MM hyphen DD”, e.g., “–0062-09-21”, and I don’t think German minus signs are any different from English ones.

njbart commented 8 years ago

Also, why don’t we use \textminus rather than \textendash?

plk commented 8 years ago

No particular reason - I've put that in to see what it looks like.

njbart commented 8 years ago

Looks good – and improves accessibility for visually impaired users, too.

simifilm commented 8 years ago

@nickbart1980 An en-dash is not a minus sign.

plk commented 8 years ago

Currently it's a \textminus but I'm open to suggestions for the default.

njbart commented 8 years ago

I’ll still argue that \textminus is the preferable and the only ISO 8601-conformant option here.

JohnLukeBentley commented 8 years ago

I haven't been able to dedicate much time to testing lately, so I thought I'd better post what I have ...

96-dates.tex suggestions

96-dates.tex bugs

Design Suggestions

plk commented 8 years ago

Bugs and test file are fixed. Thinking about the strong/weak issue. I agree that the format you want should be possible out of the box.

JohnLukeBentley commented 8 years ago

Yeah, there might be a better way to get the format I want out of the box. Feel free to discuss half-baked ideas, if that helps.

plk commented 8 years ago

The question is - would you expect this to the same as EDTF in respect of the way circa information is printed. In respect of the range separator? I am inclining more to the view that this is specialised enough to warrant your own style modifications. In reality not much more than a copy and re-write of \mkdaterangeedtf, \mkdaterangeedtfextra, \blx@edtfdate, \blx@edtfenddate but probably too specialised for the core which has grown a lot already with the new date features ...

JohnLukeBentley commented 8 years ago

My inclinations ...

Summary

would you expect this to the same as EDTF in respect of the way circa information is printed. In respect of the range separator?

Yes.

I am inclining more to the view that this is specialised enough to warrant your own style modifications.

I'm inclined to think these will be common enough to warrant being supported out of the box.

The following expresses this in detail. Note this is half-baked at my end. There's plenty of scope for other considerations or alternative designs.

Example Output

Example outputs available out of the box:

Biblatex options

In the second example output the relevant biblatex options include ...

The existing options you have:

Additional options ?:

Contiguous datetimes in datetime ranges

We want some way for a style author to readily achieve ...

2016-06-22T15:26:00Z/2016-10-03T07:15:00Z  [or] 
2016-06-22 15:26:00 Z / 2016-10-03 07:15:00 Z.     

... rather than ...

2016-06-22/2016-10-03 15:26:00-07:15:00

Add \printdatetime (and \printlabeldatetime, \printorigdatetime, etc (as previously suggested)?

Possible alternatives for achieving the above

<datetype>date, <datetype>time, <datetype>datetime (and other similar options) values either:

  1. Include "edtfstrong" V "edtfweak". As previously described.
  2. Have only "edtf" (as is current) but other options (datezeros, timezeros, seconds etc.) override (as is not current). Effectively making "edtf" a "weak" choice. But a EDTF "strong" output is achieved by settings the right combination of other options (you manually ensure the other options don't override, and thereby produce, a EDTF conforming output).

Edit: 0279~? to -0279~? in example.

plk commented 8 years ago

Please pull 3.5/2.6. New things which address your comments:

96-dates.tex file is updated to demonstrate the new time integration and tracks more options.

JohnLukeBentley commented 8 years ago

All that looks clever.

In particular edft/ymd is probably better than edtfstrong/edtfweak.

Would there be a better string instead of ymd, given that the format can output a datetime? iso8601 (although "ca." is not iso) or ymdhnsz (although too hard to remember)?

plk commented 8 years ago

I'm not sure the string matters much - they are just mnemonic. The other date format strings are not very descriptive either. Perhaps you can see if you can get the format you want out of the box now?

JohnLukeBentley commented 8 years ago

As a general matter, in life and when coding, I'd suggest getting the names for things right makes life easier. Conversely, misnomers can propagate all sorts of problems. In life, for example, there's a detailed story to tell about the misnomers of "objectifying", "homophobia", "pin check [in skydiving]" and the conceptual and practical problems they propagate.

Here, for example, there's arguments to be made against edtfstrong/edtfweak: that could mislead folk into thinking that "strong" and "weak" are conformance levels defined in EDTF.

But, yes, let me download and test the latest iteration and I'll give another set of feedback. And, then, I might try to come up with something more clearly superior to the string for ymd.

Most of the strings you've chosen for variables (and their values) get things spot on.

plk commented 8 years ago

Yes, I'm well aware of the misery of having to re-engineer bad names for functions/variables and the like as changes make the initial choices clearly inadequate ...

JohnLukeBentley commented 8 years ago

96-dates.tex suggestions

Design suggestions

Code Bugs

Design endorsements

Authoring style for date ranges

From dipping my toe into authoring styles, thanks to your prompts and biblatex.pdf documentation, I have seen how some level of customization is relatively easy. For example I've thrown the following into my 96-date.tex, to good effect:

\DefineBibliographyExtras{english}{
    \renewcommand*{\bibdatedash}{\space--\space}
}
% => ca. 1934 – ca. 1936

\DefineBibliographyStrings{english}{
    % Not that I'd do this in production, "ca." is the chicago recommendation -
    % as you have it.
    circa = {c\adddot} 
}
% => c. 1723

\DefineBibliographyStrings{english}{
    spring           = {spring},
    summer           = {summer},
    autumn           = {autumn},
    winter           = {winter},  
}
% => summer 1934/autumn 1934 (although, as above, I suggest that 
 % seasons should be suffixes).

Tests passed

plk commented 8 years ago

Thank you for the feedback:

  1. 96-dates.tex is fixed
  2. UTC TZ string is now controlled by \mkbibtimezone and \bibutctimezone macros, defaults to "Z"
  3. Dash for negative dates is still open for discussion but a \textendash seems wrong. I agree that \textminus looks a bit long but I suppose that depends on the font.
  4. I'd say that "Spr. 1856" reads better than "1856 Spr." and since these are colloquial, not standards-based formats, that's probably better. This can be redefined anyway (see english.lbx). They are sequential in EDTF output (see 7 below).
  5. I'm not so bothered about the era names as they tend to end up as mere noises or arbitrary strings to users, as long as they know what they do. I think it's not so bad but this can be revisited.
  6. Julian options. I did have a "julianstart" option which allowed the conversion to be unbounded in both directions but @nickbart1980, who knows about this topic, convinced me that it was unnecessary.
  7. EDTF output now prints the correct numeric values for seasons
  8. Time component separator issue in EDTF output was a bug, now fixed.
  9. \ifdateera takes only one argument which is either "bce" or "ce" and so your tests will always return false. It's used a lot internally, particularly in the \if*dateera forms.

Only biblatex update needed.

njbart commented 8 years ago

Just a few comments:

  1. I prefer “Spr. 1856” as default, too. The Chicago Manual of Style, 16e, e.g., gives the season first throughout (14.180, 14.189, 14.221, 14.271). The APA Manual, 6e, 6.28, on the other hand, recommends, “If the date is given as a season, give the year and the season, separated by a comma and enclosed in parentheses.”

  2. Julian options. No country has ever switched from Gregorian to Julian, and virtually no country uses Julian any longer (very few exceptions: Berbers, Mount Athos). So I don’t expect much demand for an all-Julian calendar, but if it’s really needed, gregorianstart=9999-12-31 or so should work.

the assumption that in most bibliographic contexts the gregorian calendar is used for all dates, including those before 1582-10-15 (is this true?)

Not quite. Most historians, and astronomers use Julian for events before 1582-10-15 G. (And most British historians use Julian [or dual dating] for events in Britain etc. before 1752-09-14 G.)

plk commented 8 years ago

I am leaning now towards using a hyphen for negative date prefices in relevant formats - it really does look odd when the negative year prefix looks as long or longer than the date range character. It should, to my mind, be like this:

-0045-- -0050

\textminus and \textendash look too long and confusing when they are almost the same as the range character.

njbart commented 8 years ago

Well, I for one feel the hyphen looks odd, and much too short.

But more importantly, ISO 8601:2004 (on which EDTF is based) specifies (in 3.4.1): “The representations specified in this International Standard make use of graphic characters as specified in 3.4. Note that, except for “hyphen”, “minus” and “plus-minus”, these characters are part of the ISO/IEC 646 [7-bit coded] repertoire. In an environment where use is made of a character repertoire based on ISO/IEC 646, “hyphen” and “minus” are both mapped onto “hyphen-minus”. […]” – which seems to imply that whenever a separate “minus” character is available, “minus” should be used rather than “hyphen”.

Also, the ISO 8601:2004 pdf itself displays the minus sign (3.4.2): it’s long, and it’s the U+2212 “MINUS SIGN” character:

So for latex, \textminus would seem to be the correct choice, and for UTF-8, the “MINUS SIGN”, U+2212.

plk commented 8 years ago

That seems like a good motivation. In fact, biber currently maps \textminus to and from U+2212 when recoding. Then, if that's settled, what about the data range character? My feeling is that it looks confusing when then date range character (when it is a dash and not a slash) looks shorter than \textminus - an en-dash looks about the same.

njbart commented 8 years ago

I’m not sure a fully satisfactory solution exists since it seems there simply is no UTF-8 char longer than a hyphen but shorter than an en-dash, but then again it’s really a niche problem that only crops up when using ymd with negative/astronomical years and redefining the date range character to be something else than the default forward slash.

That being said, I still feel that adding a little space before and after the range dash if followed by a minus sign – like in the output of a math formula $-100 - -44$ – is likely to be the aesthetically most pleasing option.

Since in most fonts I tried \textminus and -- differ visually, it might be even better to redefine the range character as \textminus, too, and effectively use something like Plato (\textminus424\,\textminus\,\textminus346).

plk commented 8 years ago

It also occurs in comp/short/long date formats (where the default is astronomical and the range sep is a dash too) and that's probably the majority case isn't it? I agree about the space - I'm looking at it but it's quite tricky due to the order dmy in most formats which gives things like:

23/4/-45 -- 5/23/-22

and you don't want the space before -22 in such cases even though it's a negative year occurring as an end of range year.

njbart commented 8 years ago

It also occurs in comp/short/long date formats […]

You’re right, of course.

Something else to keep in mind is that some style guides discourage using range dashes in combination with minus signs, and prefer “ to ” instead:

Various style guides (including the Guide for the Use of the International System of Units (SI) and the AMA Manual of Style) recommend that when a number range might be misconstrued as subtraction, the word "to" should be used instead of an en dash. For example, "a voltage of 50 V to 100 V" is preferable to using "a voltage of 50–100 V". Relatedly, in ranges that include negative numbers, "to" is used to avoid ambiguity or awkwardness (for example, "temperatures ranged from −18 °C to −34 °C"). (https://en.wikipedia.org/wiki/Dash#En_dash)

plk commented 8 years ago

That's a good point and one I was looking at too but that's quite a big change for people I fear?

plk commented 8 years ago

I have found a reasonable solution to the spacing issue. There is now \bibdateeraendprefix in addition to \bibdateeraprefix. By default, the latter adds a thin space first but this is overridden by ymd and edtf which use slash as the range sep and so don't need the space. There is still a bit of a mess with astronomical+negative+month/day standard date formats which print negative dates in dmy format like:

12/3/-34 -- 14/4/-30

etc. but these are formats which people won't want to use anyway since they are messy - they would likely not use astronomical as this avoids such ugly formats.

plk commented 8 years ago

@JohnLukeBentley, @nickbart1980, @moewew - I am starting to think about the next release and so can we think about closing this and the other related tickets?