pkiraly / qa-catalogue-datavis

Data visualisations for library union catalogues
GNU General Public License v3.0
0 stars 0 forks source link

Add visualisation: cataloging timeline #15

Closed nichtich closed 9 months ago

nichtich commented 10 months ago

Show date of cataloging instead date of publication.

pkiraly commented 10 months ago

I would say: two timeline, one for publication, another one for cataloguing, and a third one combining the two (this is already part of QA catalogue).

What is the PICA field for the cataloguing date?

nichtich commented 10 months ago

The PICA field is 001A0 but it contains ILN and DD-MM-YY separated by : so the Solr indexing would need to be adjusted first.

The history of cataloging being part of QA catalogue requires further processing and generation of an image but cannot directly retrieved from Solr index, can it?

I'd first focus on visualisation that only require some Solr queries to existing index.

pkiraly commented 10 months ago

For multi million results it is slow, but for real queries we can extract it in a reasonable amount of time. I have two questions:

  1. What is the rule for the year part of the date?

05-10-21 -- should be logically interpreted as 2021, and 05-06-86 as 1986, but do you have a clear threshold value to decide from which year it is 20th or 21 century? I expect that there will be typos, so a hypothetical 05-10-61 should be interpret as 1961 or 2061? I have the same problem with MARC21, where the standard does not have a guidance.

  1. What would be the ideal precision? Day, month or year?
pkiraly commented 10 months ago

for the 1st question: it seems the earliest date is 01-01-71 and that must be interpreted as "unknown date" or "in the very old times" because the earliest terms look like

    "1971-01-01": 424354,
    "1972-05-15": 1,
    "1972-06-12": 2,
    "1972-06-24": 1,
    "1972-06-26": 2,
    "1972-07-10": 1,
nichtich commented 9 months ago

I think grouping by year is enough to start with.

pkiraly commented 9 months ago

I close it now. We should improve it later by