Closed atomrab closed 9 years ago
There is no "date type" in the underlying dataset. All we have is: the original label as entered by the curator, and the normalized ISO8601 date. Maybe we should show the original with the normalized version in parentheses after.
But then how do you know that e.g. Fasti is using an idiosyncratic "Before present counting back from 2000" dating system? Or that a Russian period uses the Julian calendar? Isn't this information we want to preserve? We did put this into the spreadsheet for the benefit of the parser. Speaking of which, how do you indicate that you're dealing with a date in the Julian and not the Gregorian calendar? It will still look like it's "AD".
But then how do you know that e.g. Fasti is using an idiosyncratic "Before present counting back from 2000" dating system? Or that a Russian period uses the Julian calendar? Isn't this information we want to preserve?
We are preserving it, by keeping the original text and the ISO equivalent. What other information do we have to preserve?
But in this case the original text is simply "5600". The type was BP 2000, but this wasn't stated in the labels themselves. So one might be able to infer from the ISO equivalent that this is what is going on, if one knows how BP dates work (though you'd be confused, since usually they're from 1950, and you might think we'd parsed incorrectly). But the average user certainly wouldn't be able to figure out this information. And how will someone entering a Julian date for pre-Revolutionary Russia be able to indicate to the parser that this is the format he's using?
The parser obviously recognizes the date formats we put in the spreadsheet. I know we want to avoid unnecessary complication, but is there a way we can allow a user to say, for example, "these dates are BP, but with 2000 as a starting point", and then let a viewer know that this was the format?
Having dates that look like "5300" when in fact they're BP with 2000 as a start date is really confusing. Having dates that say "5300 (-3299)" is still pretty confusing. That's my point.
Should the FASTI dates say e.g. "5300 BP" even though that is not how they appear in the source? If they had been entered through the client, they would have to be "5300 BP" because otherwise they would not be parsed correctly. So maybe we just add "BP" to all the FASTI dates, plus an editorial note explaining this. Would "5300 BP (-3299)" be less confusing?
Still confusing, because Fasti used 2000 as a the point from which to count back, whereas most people use 1950. Same with Julian dates, which still say BC/AD but mean something different from the Gregorian BC/AD. Or Hegira dates, or Chinese dates, etc., especially if the original label assumed the user knew which dating system was being used from context. Basically, I'm asking if we need a way to show the user context, and allow the data-enterer to provide it.
On Thu, Mar 12, 2015 at 8:56 AM, Ryan Shaw notifications@github.com wrote:
Should the FASTI dates say e.g. "5300 BP" even though that is not how they appear in the source? If they had been entered through the client, they would have to be "5300 BP" because otherwise they would not be parsed correctly. So maybe we just add "BP" to all the FASTI dates, plus an editorial note explaining this. Would "5300 BP (-3299)" be less confusing?
— Reply to this email directly or view it on GitHub https://github.com/periodo/periodo-client/issues/25#issuecomment-78483592 .
Do you think that should go in a different place than the editorial note?
Ryan- these dates could not have been entered into the interface and parsed automatically. I had to handle them specially when marking up the dataset. The parser will handle a date with the suffix "BP2000" correctly, but, as Adam says, that was not in the original source. To enter these dates, a user would have to manually enter the start/stop labels and values separately.
The BP dates show up as just the numbers from the label, but this doesn't make any sense to the user without an explanation that the dates are in BP counting back from 2000 (especially confusing because of the Ukrainian BC dates).
This is also an issue the FASTI dataset has itself. For example, this period uses ms:BEG and ms:END to describe dates, which map to strings in the relevant schema. I remember having a difficult time figuring out that they were BP2000 while looking through the site.
Maybe we need to be able to add editorial notes to period collections, not just periods. Then the period collection view would look like:
FASTI - Home http://fastionline.org/ 2004 AIAC; L - P : Archaeology
Note: FASTI uses an idiosyncratic mix of dating schemes in order to confuse its prey. For example, "5300" actually means "5300 years before the year 2000." However if the period is in the Ukraine, they use BC dates, because why not?
Periods:
Архаїчний період | -0646 (647 BC) | -0499 (500 BC) | Ukraine |
Tardoantiguo | 0401 (1599) | 0600 (1400) | Spain |
I was thinking about it over lunch and came to the same conclusion
Fasti does have a problem with the date thing, but the Ukrainian part isn't their fault -- it's because their Ukrainian periodization was implemented in a funny way, so we got a more raw stage of their data.
I do think it makes sense to have editorial notes at the collection level, especially since it's not uncommon to have a couple of different systems in a single collection (e.g. a book that gives Greek Neolithic dates as BP but switches to BCE in the historic period, which is actually pretty standard practice). But this still doesn't make it super-simple for the non-expert user, who will be expected to look for that note in order to understand what "5300" means.
What would be the way around that?
I think that we make it easy for non-expert users to interpret dates, because we add an estimation of an ISO8601 year for the given endpoints. If they trust our data and curation process, they can use those values for search/retrieval/broad understanding. The record of how we derived those estimations should be somewhere in the editorial note (either at the Period or Period Collection level)
Yes, exactly. I say we go with this solution (editorial note on period collection) unless Adam can propose an alternative.
I can live with that. But how does it work on the data input side? Going back to that Julian date issue, does the user have to do that conversion him or herself before putting it in? Will we have a list of approved formats that can be properly parsed (e.g. BP1950, BP2000, BCE/BC/CE/AD Gregorian, BC/AD Julian, AH, etc.)?
Sorry, not clear to me from Patrick's comment whether he's saying we already make it easy, or we should make it easy. I think the latter is true: so right now for Fasti, we get the BP numbers as start/stop in the GUI (the ISO values are only in the json). As long as we show the ISO start/stop to the average user plus we have a collection-level note about the originals, I think we cover the bases (though I can already see the frantic student emails in my mind).
Going back to that Julian date issue, does the user have to do that conversion him or herself before putting it in?
Yes. Right now, there is no way to automatically infer whether AD/BC refers to Gregorian or Julian dates. The parser assumes Gregorian. We could make the parser more extensible to set different preferences, but that's the state of things.
As long as we show the ISO start/stop to the average user plus we have a collection-level note about the originals, I think we cover the bases (though I can already see the frantic student emails in my mind).
I agree. I was saying that we already do make it easy (via the dataset), but you are correct that both the label and derived values should be visible in the interface.
How labor-intensive would the extension of the parser be? We have definitely already encountered AH, BP1950, BP2000, and I would imagine that if I start trying to do stuff with 19th c Russian sources, we'll hit the Julian issue as well. Maybe those four plus the Chinese and Jewish calendars would be enough?
re: bp, BP1950 is the same as BP right? The parser understands suffixes to BP.
The difficulty wouldn't be in extending the parser, necessarily. I could just put a switch that says "interpret AD as the Julian calendar." It would be more in the interface design. Would you suggest, say, a source-wide setting in the client to set parser options?
BP with a countback from 1950 is the standard usage (because BP started with be used with C14 dating, and C14 dating is screwed up by anything later than 1950 because of nuclear testing). BP with a start of 2000 is a Fasti idiosyncrasy.
Sorry, missed the latest. I was thinking more of a pulldown with "Select one of of the following calendrical systems: Gregorian (default), Julian (Catholic: before 1582; Britain: before 1752; Russia: before 1918), BP (1950), BP (2000), AH (Anno hegirae), AM (Anno mundi), however the Chinese calendar works"
Adam, your suggestion would be a great one if we could enumerate all possible dating systems. But as we've learned, there is no such standard list. At the extreme, there could be as many dating systems as there are sources. So I think a smart parser plus fallback to manual override + editorial note is our best option.
Ok, fair enough. But could we at least have a BP that parses with a 1950 start date, not 2000, which is weird? I still think that we could put in that basic list I provided and cover 99% of all cases, with the rest covered by manual override+editorial note.
On Thu, Mar 12, 2015 at 3:48 PM, Ryan Shaw notifications@github.com wrote:
Adam, your suggestion would be a great one if we could enumerate all possible dating systems. But as we've learned, there is no such standard list. At the extreme, there could be as many dating systems as there are sources. So I think a smart parser plus fallback to manual override + editorial note is our best option.
— Reply to this email directly or view it on GitHub https://github.com/periodo/periodo-client/issues/25#issuecomment-78610151 .
Oops, maybe Patrick's comment above already clarified the BP situation. Yes, BP1950 is usually what people mean when they write "BP".
On Thu, Mar 12, 2015 at 4:18 PM, Adam Rabinowitz adam.rabinowitz@gmail.com wrote:
Ok, fair enough. But could we at least have a BP that parses with a 1950 start date, not 2000, which is weird? I still think that we could put in that basic list I provided and cover 99% of all cases, with the rest covered by manual override+editorial note.
On Thu, Mar 12, 2015 at 3:48 PM, Ryan Shaw notifications@github.com wrote:
Adam, your suggestion would be a great one if we could enumerate all possible dating systems. But as we've learned, there is no such standard list. At the extreme, there could be as many dating systems as there are sources. So I think a smart parser plus fallback to manual override + editorial note is our best option.
— Reply to this email directly or view it on GitHub https://github.com/periodo/periodo-client/issues/25#issuecomment-78610151 .
When browsing the Fasti periodization, there are two date types: BC dates from a semi-formal Ukrainian periodization, and BP (2000) dates from the main periodization in use on the site. The BP dates show up as just the numbers from the label, but this doesn't make any sense to the user without an explanation that the dates are in BP counting back from 2000 (especially confusing because of the Ukrainian BC dates). We need a way to display either the date type, so the user knows what he or she is looking at, or the normalized ISO8601 proleptic Gregorian dates we show in the JSON, or both -- but we're not showing enough to make sense right now.