Closed gegic closed 6 months ago
Dear @gegic
I just checked 700, 701, 702, 710, 711, 712 and they have $2 that could be used:
"2": {
"label": "Source",
"repeatable": false
},
In some other fields (5xx) there is something else, which needs investigation:
"2": {
"label": "System Code",
"repeatable": false
},
In several cases it contains "SIPOR", which is a Portuguese classification/authority dictionary: https://www.bnportugal.gov.pt/index.php?option=com_content&view=article&id=484&Itemid=531&lang=en.
So use $2 as the default settings, and later we could adjust by consulting with the UNIMARC experts.
Dear @pkiraly, thank you for the answer :D
I had already previously checked those points but wasn't quite sure whether to take them into consideration because of the following:
604 $1500 $2....
, where $1 embeds 500 into 604.SIPOR
is used as a value of the subfield $2 only for the 6-- block, never for 5-- or 7--.Should I nevertheless use the subfield $2 to parse the scheme?
Dear @gegic,
you are right that $2 is not what we are really looking for but for the time being please use it if it is available (usually it is not available).
Dear @pkiraly,
Thank you for the answers.
I have now added that part akin to the one extracting the source in the MARC21 analyzer. In addition, I also refactored some few more things.
I suppose the PR is reviewable now :)
This pull request consists mainly of changes related to the integration of UNIMARC authority name analysis.
ContextualAnalyzer
which incorporates common functionality for bothClassification
andAuthority
processes;BibliographicRecord
and its subclasses. Also added some comments which better explain what certain methods do;UnimarcRecord
classOne part that I would like to be validated more closely and about which I'm not completely sure is the following:
Due to the nature of the UNIMARC format, there isn't such a thing as Source of heading or term (subfield $2), which was used for the analysis of MARC21. In addition, I have analyzed the available catalogues and records, as well as some additional records online, such as the ones available at the portals of Bibliothèque nationale de France or catalogues with modified formats such as COBISS+ from IZUM, Slovenia and I wasn't able to find anything that would resemble something indicating a schema source in any of the checked records (nor formats).
Therefore, all fields from the UNIMARC authority analysis are handled as:
which in turn renders the
authorities-by-schema
output quite useless. While some of the used fields do have the $2 Source subfields, they are mostly actually:Any advice on this would be greatly appreciated.
The fields used for the UNIMARC authority analysis (groups suggested by @pkiraly) are:
authorityTagsMap.put(AuthorityCategory.PERSONAL, List.of("700", "701", "702"));
authorityTagsMap.put(AuthorityCategory.CORPORATE, List.of("710$ind1=0", "711$ind1=0", "712$ind1=0"));
authorityTagsMap.put(AuthorityCategory.MEETING, List.of("710$ind1=1", "711$ind1=1", "712$ind1=1"));
authorityTagsMap.put(AuthorityCategory.GEOGRAPHIC, List.of("620"));
authorityTagsMap.put(AuthorityCategory.TITLES, List.of("500", "501", "506", "507", "576", "577"));
authorityTagsMap.put(AuthorityCategory.OTHER, List.of("730"));