yalelibrary / YUL-DC

Preliminary issue tracking for Yale University Libraries Digital Collections project
3 stars 0 forks source link

Figure Out Date Topics #889

Closed dl-maura closed 3 years ago

dl-maura commented 3 years ago

The date_ssim and dateStructured_ssim are not always the same. e.g. if dateStructured_ssim is 1986, date_ssim is 1971 In one of our old system, we have facet date, display date, sort date and search date use different fields, and may have inconsistent sorting or search result. It is reasonable that the date_ssim, and dateStructured_ssim are both used as the search date; I wonder if the sort date also sorted by the same values as the search fields. (I know it is not in the scope of this ticket)

dl-maura commented 3 years ago

Martin believes we should be using dateStructured

martinlovell commented 3 years ago

Sorting and faceting with year_isim based on dateStructured

Sorting and faceting is done by values in dateStructured from metadata cloud and stored in dataStructured_ssim, it is also expanded into an array of integers and stored in the year_isim field by the management app.

dateStructured is only precise to the year.

We do not display dateStructured in blacklight, which makes the results based on the slider and the sorting confusing for the user.

Ranges are expressed using slashes (following DC year date range format: https://www.dublincore.org/specifications/dublin-core/dcmi-terms/terms/date/)

So, for all sources, dateStructured contains zero or more single years YYYY and ranges YYYY/YYYY. (It may contain multiple values if there is more than one non-continuous range.)

DateStructured sources:

For Ladybird: dateStructured comes from FDID 280, but is cleaned into years and year ranges only.

280 is pretty ugly, so I doubt we want to display it as is. (originally there was no formatting for the field, then it was formatted often as YYYY-00-00 which is not correct since months and days are 00 instead of 01.)

For Voyager: dateStructured for Bibs follows Metadata TF to get a year or year range. We are in the process of updating the code to use Item CHRON when available. There are some cataloguing rules for CHRON, but there may be some exceptions, so Maggie is working on extracting the years and year ranges from the values. The BIB fields are not in a human readable format. The CHRON is generally human readable, but not a great format.

For Archive Space: We pull from dates and date ranges in the json to generate years and year ranges. Archive space date field pulls from similar parts of the JSON, but will also pull from date "expressions" if it's available. Date expressions may be more human readable.

martinlovell commented 3 years ago

Comparing Fdid 280 vs 79

C1: https://app.zenhub.com/files/242185115/251f1054-37a8-46d9-89b0-8c2edf48e2aa/download

C3: https://app.zenhub.com/files/242185115/ce1d09af-e28e-4174-a35a-275a34ae6a33/download

alishaevn commented 3 years ago

(from slack thread)

Martin Lovell: Looking at c6, some examples for 79 are “Qing Qianlong 34 nian [1769]” or “[Ming Wanli 15 nian i.e. 1587]”

Alisha Evans: nian = “year” in chinese. I did a google search for “qing qianlong” and it appears his reign (according to wikipedia) began in 1735. add 34 years to that and you get 1769.

so it appears that this collection is formatted as “x” amount of years into an emperor’s reign, followed by the actual year. parsing all of these different types under fdid 79 for various collections seems like it would be pretty difficult though.

maybe we could display both?

alishaevn commented 3 years ago

martin and I ran another query comparing 6 different date fields: https://app.zenhub.com/files/242185115/a0c34fc1-3a3f-4a33-8f41-5600db985759/download 79 - Date 80 - DateDepicted 81 - Datekey 280 - DateStructured 308 - IntStartYear 309 - IntEndYear

the field that is most consistently filled out is fdid 79. however, the dates in 79 don't seem to have any type of consistent formatting.

image.png image.png image.png
image.png image.png image.png
dl-maura commented 3 years ago

Rebecca is going to work with Jay to fix the 308/309 for proper dates in LB data. ASpace and Voyager dates should be better (ASpace gets audits and corrections and Voyager is more tightly controlled about what data can be entered)

dl-maura commented 3 years ago

Need a card to do ASC earliest date, DESC latest date for ranges