Open aurimasv opened 9 years ago
We might have to allow some sort of free-form field because we can't guarantee that we can interpret dates correctly on import from web or import translators (though for web we can just drop unparseable dates). It's also important to keep unparsable dates during DB upgrade.
We can still go with no free-form entry from the user, but, in case of unparsable date, display the free-form date in a read-only field, highlight it in red (or something else that draws attention) and basically encourage the user to select the correct date from the date picker.
Some questions:
https://www.zotero.org/trac/ticket/888 includes a good amount of discussion about need for entering dates that may not be parsable. I don't have time to look over it in detail though.
The way these are stored behind the scenes is another PITA, but as long as it works, I don't think the implementation really matters.
@bdarcus has long been a supporter of EDTF (http://www.loc.gov/standards/datetime/pre-submission.html), which might be a good format to interface the Zotero UI and citeproc-js. You'd of course still have to store the unparsed strings.
Two popular JS-based date pickers:
https://github.com/bevacqua/rome https://github.com/ChiperSoft/Kalendae
On the copyright date issue @zuphilip raises, my own book illustrates the problem: it was published in one calendar year, but given a copyright date (which is really the one that should be included in citations) of another (the following) year. I'm guessing for people citing it, they choose one or the other as the issue date.
Putting that info in the rights field doesn't really solve that issue.
On the copyright date issue @zuphilip raises, my own book illustrates the problem: it was published in one calendar year, but given a copyright date (which is really the one that should be included in citations) of another (the following) year. I'm guessing for people citing it, they choose one or the other as the issue date.
I'm not really sure what the suggestion here is. The date field is for the date that should be cited as the issue date. Whether the user/translator should be using the copyright date or the publication date to fill in that field seems like an unrelated topic to me. Feel free to elaborate if I'm missing something here.
"Publishers have been post-dating publication dates and copyright dates for as long as they've been putting them in books." cf. http://www.midwestbookreview.com/bookbiz/advice/copyrigh.htm , but I agree with @aurimasv that it is negligable for zotero.
@bdarcus has long been a supporter of EDTF (http://www.loc.gov/standards/datetime/pre-submission.html), which might be a good format to interface the Zotero UI and citeproc-js.
That looks pretty good. It also gives us a good list of possible date formats. I don't think we need to worry about anything in Level 2. From Level 1, approximate would be the "circa". The only other thing we would want is seasons. Everything else in Level 1 can probably be ignored. There's some discussion on the forums about "uncertain dates", but best I can tell, all of it actually refers to "circa"/"approximate" dates. I think I recall a discussion about unspecified dates as well (can't find it now), but it's a bit more difficult to implement from UI perspective, so I'd rather not if it's not necessary for 99.9% of cases.
Two popular JS-based date pickers: https://github.com/bevacqua/rome https://github.com/ChiperSoft/Kalendae
I like Kalendae and I think we can extend it to fit our needs.
Thanks @rmzelle!
Some other tricky examples from the Trac ticket:
Also:
My conclusion on the Trac ticket was that, even if we supported more forms, a free-form entry option was unavoidable, since there would always be forms we wouldn't understand/support...
"Winter-Spring 2000-2001" (fine as long as date range supports season selection too)
Yes, just have to figure out how to put this on a date picker
BCE dates
Thought about this too. Not really against this, but it's a bit harder on a date picker. Will figure something out
Dates in other calendar systems?
I think we can do this well by allowing to switch the date picker into other calendar systems, but store the actual date in Gregorian calendar. Not sure what the down side of always using a Gregorian calendar for citations would be. Seems like if something needs to be cited in a different calendar it would have to be cited for all sources, so it should be a CSL thing. Or maybe not? Do you have to cite Gregorian and then an additional calendar for only some of the sources that are dated with that calendar system? I can actually think of a pretty neat way of supporting this as long as CSL supports it.
Masked precision (16xx) from the EDTF spec seems important, and I think is distinct from approximate date
Idk, I haven't seen much discussion on this and I certainly never worked with something that has unknown date parts. I guess this sort of just works now as a raw date string. This would be tough with a non-free-form date field
My conclusion on the Trac ticket was that, even if we supported more forms, a free-form entry option was unavoidable, since there would always be forms we wouldn't understand/support...
Maybe. I'm trying to figure out what these possible outliers may be (not all obviously, but just one that we might want to but not be able to accommodate). Some of the above points could suggest that we need to support a free-form date field, but if we can't figure something out that breaks the date-picker mechanism together, then I'm sort of leaning towards what @adam3smith said in this thread
MG6 said:
May I suggest that if you take the plunge, the exceptions will start to roll in of their own accord? It might be impossible to predict everything people are going to want?
but the problem is that you don't want to adjust things too often - and depending on what you do you might even lock things in to some degree. So you want to have a system where you can accommodate 99% of all people 99% of the time - and then you can tell the remaining 1% of people or the ones hitting 1% of cases: "tough luck this can't be done systematically - edit manually at the end".
Yes, just have to figure out how to put [BCE dates] on a date picker
Let me elaborate on what I'm thinking. My current idea is to have two separate fields for year and month in the date picker (unlike what Kalendae currently has)
The year field would allow free-form entry (picking from a list is not an option for us), where you can only type in numbers. It would then allow you to put a space after the number and type in BC/BCE. If you start the number with a negative sign, it would automatically convert that into "YYYY BCE" format (for display purposes only. behind the scenes it's a negative number). I suppose this would make it possible to allow uncertain years as well.
The month field would be a drop-down list of months followed by a separator followed by a list of seasons followed by "unknown"/"uncertain" whatever. The list would be localized based on your UI language and would allow you to select-as-you-type either in your language, in English, or with numbers.
The days would be displayed as usual (not sure if we should worry about days of the week... that's probably going to be done for us already though). If uncertain is selected for the month, then we would just display 1-31. You could also type to select. Keeping with the above, we would also want to provide a way to select "unknown" (if we don't have to worry about days of the week, we can always put it as a day in the bottom right corner of the calendar).
Below the day display you would have a check box to select "circa".
You could tab between year, month, day fields. If you're tabbing through the other fields in item info tab, once the date field gains focus, you can either press enter to open the date picker or start typing a year, which would open up the date picker and focus the year field.
This will be a bit of a (fun) challenge to code up, but I'm hoping that it creates an intuitive date picker that allows quick date entry.
Dates in other calendar systems?
I think we can do this well by allowing to switch the date picker into other calendar systems, but store the actual date in Gregorian calendar.
Dates in lunar calendar systems (pretty much everything but Gregorian) add intercalary months at arbitrary intervals to keep the cycles roughly in sync with a yearly progression. They can't be converted without tabular data that is specific to each system. It's doable, but it would be a big undertaking. There probably isn't a lot of enthusiasm in CSL for building those data sets into the specification (but I could be wrong there).
I guess the main question is whether this needs to be cited in a different calendar system and how would that look. Zotero might be able to do a lot of the heavy lifting there and pass the date formatted in the other system as it has to be displayed in the citation. We can probably live without this at least initially
Yeah, I wasn't suggesting that we need to support different calendars natively at all, just questioning whether it's realistic to get rid of an option for free-form entry with literal pass-through. "Literal" could even be an option, if we don't think that's too gross.
I'm having a hard time finding any documentation on citing dates in non-Gregorian calendar, so for the time being I'm going to assume it's not something we need to worry about.
We can add a literal option, which would have to be selected explicitly (i.e. we shouldn't default to it if we fail to parse a date coming from translators).
Possibly information on them tends to be in less accessible languages. There seems to be conversion data about. Here is a conversion site for Japanese historical dates (Japan shifted to the Gregorian calendar sometime in [Gregorian] January of 1872, IIRC). Enter a yyyy-mm-dd Gregorian date in the second field (グレゴリオ暦), and it will spit out various representations in the others.
There's a lot to be said for a free-text entry option. ;-)
The question is how do you cite these dates. If these are always cited in Gregorian calendar, then we can let the user figure out the conversion. If we need to be able to maintain non-Gregorian calendar information, then that's another question.
Free form gives users the impression that something may work where it simply will not. I don't mind leaving the option in for very edge cases, but the idea is that you shouldn't need to resort to it.
In native-language Japanese historical writing, I'm pretty sure early documents are cited according to the old (lunar/Imperial) calendar system. I'll check with a colleague. If so, then I imagine it would (of course) be very helpful to hold dates in Gregorian with conversion for rendering, so that timelines and the like would work out of the box. (It's not really important to me personally, I'm just guessing what historians with a regional focus would say.) Will try to fetch up more information soon.
[didn't get to this, sorry]
Yeah, I don't think the specifics of citing should determine what we store. We should be able to accommodate dates that people might want to retain in their research.
We can add a literal option, which would have to be selected explicitly (i.e. we shouldn't default to it if we fail to parse a date coming from translators).
So you're saying a failed parse on import would flag the field as a potential error, and if the user wanted to manually approve it as "Literal" so be it but otherwise it would appear as an error that should be fixed?
What qualifies as a failed parse? I've argued previously that having anything left over after parsing known parts should qualify as a failure (though ideally with considerably more comprehensive parsing than we do now).
I don't think the specifics of citing should determine what we store.
That's the whole model behind this software. It's aimed at generating citations. If someone wants to retain some information that does not pertain to citations they can always attach a note.
What qualifies as a failed parse? I've argued previously that having anything left over after parsing known parts should qualify as a failure (though ideally with considerably more comprehensive parsing than we do now).
From web/search translators, whatever is not parsed can be discarded. From import translators, we can keep the unparsable data and flag it for review. The parts that were parsed would serve as the data that could be cited. The extra data would be ignored.
That's the whole model behind this software. It's aimed at generating citations.
That's one (important) feature, and it provided some structure for early data model choices, which for technical reasons have remained fixed for a long time. It's absolutely not the central purpose of the software — which is to be a powerful research tool — and that will be even more the case as we introduce custom item types/fields and other ways of interacting with Zotero data.
From web/search translators, whatever is not parsed can be discarded. From import translators, we can keep the unparsable data and flag it for review.
Why the difference?
It's absolutely not the central purpose of the software — which is to be a powerful research tool —
Idk what plans you have for the future, but as far as I see it doesn't go beyond collecting, organizing, sharing, and referencing sources (and I don't think it should).
as we introduce custom item types/fields
Don't want to go off topic, but I've never seen a good argument for this.
From web/search translators, whatever is not parsed can be discarded. From import translators, we can keep the unparsable data and flag it for review.
Why the difference?
Because when web scraping, any additional data is probably junk (granted that our extraction algorithms are good). For import, that's also probably the case, but there's a chance that the user specifically put it in there, which means we shouldn't just discard it. The same reason we drop a lot of junk from RIS files that are imported via Web translators, but keep it all (or at least used to) for files imported manually.
This is a discussion for elsewhere, but we've never thought of Zotero as a citation manager — we think of it as a research tool, which has a broader mandate with much more potential. Among other things, it means that data in Zotero can serve a purpose on its own without regard for how it might be cited. (But in this case it sounds like they might be cited anyway.)
Because when web scraping, any additional data is probably junk (granted that our extraction algorithms are good).
It seems like this may vary a lot between translators/sites. If we're saving from the NYT, obviously we should end up with a clean date. If we're saving from, say, the LoC, I don't think we can make that assumption. Just clicking around on the latter, the very first item I clicked on has a MARC date value of "[between 1880 and 1893]", which we're pulling in, erroneously, as "1880", because we run it through /[0-9]+/
for some reason. Ideally we'd be able to parse something like that automatically, but if not, that's still critical information that we'd be throwing away. I'm sure we could find harder-to-parse examples if we looked for a few more minutes.
(And to bring this back to the earlier point, even without the ability to cite date ranges, Zotero should still be keeping that info, because it's important research information. I don't love the heterogeneity of our current date field rendering and think we can improve it, but keeping the full string was a fundamental design choice.)
This is a discussion for elsewhere, but we've never thought of Zotero as a citation manager
@dstillman : Can you provide a space for such a discussion somewhere else?
Wikidata is handling a wide range of date format, maybe it is helpful to look at its implementation. Here is an example object where you can try it out as you wish. BTW they support Julian calendar besides the Gregorian.
IMO the localization is an important and maybe nontrivial point as well. It really matters if the days are coming before or after month etc.
"in-press": would be similar to "No Date", but, depending on what the ultimate consensus is, may allow to specify a future date of publication.
I think 'Status' needs to be a separate variable. Journals nowadays frequently publish articles online before assigning volume/issue/page numbers. These articles have both a numerical date and a status (e.g., 'Advance online publication'). The appropriate label for citing items with a status variable varies by style (Chicago uses "forthcoming", APA uses "in press" for items not online and "Advance online publication" for items online items), so a checkbox or flag variable would probably work best, rather than a free-entry field.
I could go either way on this: while "in press/forthcoming" is used instead of a date, "Advanced online..." is used in addition to a date, so arguably those should be handled by different fields? On the other hand, they all do describe the publication status. Finally, there's no strong reason that they shouldn't all be mapped to status, but could be handled differently GUI-wise.
Putting a status checkbox in the date picker that can be combined with any of the other date format might work well. Formatting of citations could be handled at the style level. I'm not sure how common citations with "Advance online..." or similar are outside of APA, so I'm not sure there is much value in increasing the complexity of the GUI when the formatting could be handled by tests in the citation style.
I'm not sure how common citations with "Advance online..." or similar are outside of APA
very common. We'll want to be able to handle them (we currently do by assuming that items in a journal without volume numbers are advanced online, that works quite well, but may be more fragile than we'd like it to be.)
The idea was also to allow free-form entry into the date field, but I'm not sure that given the proposal below this would be necessary as long as we can use the date pickerPlease do not remove the ability to simply write or paste a date.
You would be able to write it, since typing would select the appropriate month/day. Pasting would be possible too I suppose, it just wouldn't stay in the posted format, but would be parsed and the appropriate date selected.
Ok, thanks for the clarification!
(I'll second Dan's point above that crufty input strings should be saved and be accessible to the user via the UI.)
First, for BCE dates, the current version of ISO 8601, and usage among astronomers for the past 150 years or so, equates 1 BCE to the year 0, 2 BCE to the year -1, and so forth. But in the Citation Style Language and earlier versions of ISO 8601, the year 0 was not allowed and the year -1 was equated to 1 BCE. Since the use of -1 = 1 BCE has normally either not been implemented, or hidden from the user's view, and the only widely visible use of negative year numbers has been what astronomers do (2 BCE = -1), I think you should follow the astronomer's convention.
As for multiple calendars in the same bibliography, I think the biggest goal is to allow the reader who has gained possession of a publication to verify that the publication in her hand (or on his screen) is the same one that was cited. One could imagine an English-language history paper citing both modern newspapers and early American colonial newspapers. American colonial newspapers published before 1752 would bear Julian calendar dates. So the history paper would need to contain dates from both the Gregorian and Julian calendars.
Requiring persons creating citations to convert Julian to Gregorian before entry would be a severe burden, because Europe converted gradually, at dates chosen by various legislatures, kings, dukes, princes, etc. Determining which calendar was in effect at a particular spot in Europe on a particular date is difficult, especially for countries that did not take on their modern borders until the late 19th or 20th century.
I'm having a hard time finding any documentation on citing dates in non-Gregorian calendar, so for the time being I'm going to assume it's not something we need to worry about.
We can add a literal option, which would have to be selected explicitly (i.e. we shouldn't default to it if we fail to parse a date coming from translators). @aurimasv
Publications from Taiwan still use their own calendar system. If you combine a publication from Taiwan with any other source, you ll have two calendar formats. For a smooth workflow from library catalogues and online journals I would expect Zotero to not require its users to manually convert date fields.
Here is an example from HOLLIS "Mingguo57" is the date, not all catalogues might contain a converted date field, like hollis does, by default.
Can I echo the suggestions here for an 'original date of publication'? It has been discussed for years on the forums.
No need to echo. Original date and other original publication fields will very likely come in Zotero 5.1. All changes to Zotero item types and fields have been waiting on the long coming underlying changes in the code coming in Zotero 5.0.
@bwiernik Ah, I see, thanks :) I'll be happy to contribute once 5.0 is out!
I am very happy to learn that original publication and date ranges will be coming to Zotero.
I found this issue as I was trying to implement a correct date range for French with the issued
in Extra
field workaround and could not find a way to replace the default separator (an en-dash): in French, dates should always be separated by a hyphen (minus dash, Unicode 002D). See for example these Wikipedia guidelines and the usual practice in an online typography book.
So, my point is that users should be able to customise the separator in the date range (at the start of this discussion, "an em dash" is mentioned) in the style or locale file.
I'll be looking into the option of adding EDTF support to citeproc-js at some point. No time frame, but there are parsers out there, and It's probably time to make that move. No opinion on UI issues in Zotero.
@retorquere Where is the date parser for BBT located in the code?
https://github.com/retorquere/zotero-better-bibtex/blob/master/content/dateparser.ts
It's not really pretty, but it's been fairly effective for me so far.
With citeproc-rs available for processing, can we move ahead with adding EDTF parsing?
Wasm performance for citeproc-rs on fx60 isn't satisfactory, but we can reevaluate performance on the fx102 branch. I believe there are also still some remaining CSL 1.0.2 issues.
citeproc-js already supports parsing of date ranges in the same syntax as EDTF (1906-08/1910-12
), though, and that's already usable via issued
in Extra. As a start, we could probably pretty easily start parsing that out of the Date field and passing that to citeproc-js. Not sure whether there are other EDTF features that citeproc-js already accepts in some form.
Not sure whether there are other EDTF features that citeproc-js already accepts in some form.
I'm not sure either, but perhaps:
BTW, @retorquere recently put this together to help with JSON validation.
(Proposal in progress. Links to related discussions coming. Feel free to add them)
There are a number of date types that need to be displayed/entered and then distinguished by Zotero or citeproc:
The proposal here is mostly related to UX. The way these are stored behind the scenes is another PITA, but as long as it works, I don't think the implementation really matters. It may have to wait for 5.0 though.
I think the current consensus (at least @dstillman has proposed this and I agree) is to use a date picker for dates. The idea was also to allow free-form entry into the date field, but I'm not sure that given the proposal below this would be necessary as long as we can use the date picker to:
For the rest, I pretty much just want to echo the original proposal by @rmzelle on https://forums.zotero.org/discussion/882/1/other-options-for-date-in-press-etc/#Item_16 The idea is to make the Date label toggleable, just like the Author label. The Date field would produce a drop-down menu that would allow to switch to:
Approximate Date: this would mostly set off an internal flag, but could also add a "c." or something similar at the beginning of the date field.Because it may be necessary to indicate circa for date ranges, or even only a single date in the date range, the circa option should be in the date picker.Feel free to toss up anything that I missed.
CC @fbennett, @bdarcus, @adam3smith