Closed SvenLieber closed 1 year ago
According to the (translated) BnF policy the last character is a control character:
The ARK identifiers assigned by the BnF contain a check character which guarantees them against isolated character errors and transposition errors.
See also general explanations here: See also explanation here: https://www.bnf.fr/fr/lidentifiant-ark-archival-resource-key
A (translated) pdf contains the following explanation regarding the control character:
Calculation of the check character is the responsibility of each addressing authority for the ARKs it is able to resolve. It is strongly recommended that each of them implements the calculation of the control character as described below when an ARK of its perimeter is provided to it.
The calculation of the control character relates to the ARK name (unqualified ARK). Base10 / base29 correspondence table:
xdigit: 0 1 2 3 4 5 6 7 8 9 bcdfg value: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
xdigit:hjkmnpqrstvwxz value: 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Algorithm:
Check that the string matches the pattern “[prefix][0-9bcdfghjkmnpqrstvwxz]*” For each character, multiply its value in base 10 by its position in the string, then do 2. the sum
Calculate the base 29 modulo of the previously obtained value. The control character corresponds to this modulo expressed in base 29.
Case 1: the last character of the ARK name provided as input to the addressing authority corresponds to the result of the algorithm applied to the previous characters of the ARK name => move to the “ARK name processing” step.
Case 2: the last character of the ARK name does not correspond to the result of the algorithm => "erroneous request" type error with the explanatory text "erroneous ARK: the ARK you entered ([ARK provided]) does not match not a valid ARK, please check its structure
Todo: test this computation to ensure I understood it correctly
This issue can be closed, there are two possible solutions:
computing the control character ourselves using the function from the commit above or from the following library: https://github.com/kbrbe/enrich-authority-csv-via-isni/blob/e0d0a6ac38646697c161d25eec7352d20be8b87e/enrich_authority_csv_via_isni/lib.py#L10-L70
Use the public SRU API of BnF with search key aut.recordid
and BnF identifier without control character as value. https://www.bnf.fr/fr/service-sru-catalogue-general-de-la-bnf
URIs in the BnF data contain an ark identifier. We extract that identifier and store it using
dcterms:identifier
, for examplecb11896963c
for the Belgian author Hugo Claus. However, other data sources such as data obtained from the ISNI SRU API, refer to the "regular" identifier of BnF, for the previous example11896963
.The ark-based identifier often has an added letter or number. Currently we only store the ark-based identifier such that we can build the BnF URIs using the pattern
http://data.bnf.fr/ark:/12148/
+identifier
if needed. However, we need to store both identifiers to be able to take links to other data sources into account, such as ISNI SRU data. Therefore we likely need two different attributes, or need the information how to "convert" the different variants of identifiers. We should check the documentation.Two examples from Belgian writers:
Stefan Hertmans
https://data.bnf.fr/en/12075075/stefan_hertmans/
=>12075075
https://catalogue.bnf.fr/ark:/12148/cb120750750
=>cb120750750
https://data.bnf.fr/ark:/12148/cb120750750
=>cb120750750
Hugo Claus
https://data.bnf.fr/en/11896963/hugo_claus/
=>11896963
https://catalogue.bnf.fr/ark:/12148/cb11896963c
=>11896963c
https://data.bnf.fr/ark:/12148/cb11896963c
=>11896963c