hbz / lobid-resources

Transformation, web frontend, and API for the hbz catalog as LOD
http://lobid.org/resources
Eclipse Public License 2.0
8 stars 7 forks source link

Alma data lists different `heldBy` institution (compared to Aleph status quo) #1721

Closed gregorbg closed 1 year ago

gregorbg commented 1 year ago

Expected behavior

The resource HT021117356 (equal to Alma MMS 99371014448006441) belongs to the decentral library for Japanese studies (ISIL DE-38-459). It should always be listed as such, especially in the .hasItem.heldBy representation.

Actual behavior

The resource HT021117356 is present in the library DE-38-459. The JSON based on Aleph data lists the following: (as per https://lobid.org/resources/HT021117356.json)

"hasItem" : [ {
  "id" : "http://lobid.org/items/HT021117356:DE-38-459:JAP%2FLs3-7Ric5%23a#!",
  "type" : [ "Item" ],
  "heldBy" : {
    "id" : "http://lobid.org/organisations/DE-38-459#!",
    "label" : "lobid Organisation"
  },
  "note" : [ "00009001" ],
  "callNumber" : "JAP/Ls3-7Ric5#a",
  "label" : "JAP/Ls3-7Ric5#a"
} ]

Most notably, it references heldBy field id as http://lobid.org/organisations/DE-38-459#!. The same resource in the new Alma environment: (as per https://alma.lobid.org/resources/99371014448006441?format=json)

"hasItem" : [ {
  "label" : "lobid Bestandsressource",
  "type" : [ "Item", "PhysicalObject" ],
  "callNumber" : "JAP/Ls3-7Ric5#a",
  "serialNumber" : "JAP/0020015",
  "currentLocation" : "K1144 / 00009001",
  "heldBy" : {
    "id" : "http://lobid.org/organisations/DE-38#!",
    "isil" : "DE-38",
    "label" : "lobid Organisation"
  },
  "id" : "http://lobid.org/items/99371014448006441:DE-38:23182509600007147#!"
} ]

Suddenly, heldBy lists field id as http://lobid.org/organisations/DE-38#! (as well as ISIL DE-38). This means that our entire "Bestand" as a decentral library is suddenly listed as belonging to the central USB Köln library. This is factually not correct, and even the Alma UI reports "Universitäts- und Staatsbibliothek Köln, Hauptabteilung".

Implications

This means that querying our entire "Bestand" does not work anymore.

Raw data musings

The Alma ITM lists the Bestand as 49HBZ_BRIDGE_UBK where UBK may point to DE-38. However, there are subfields u and w which contain additional information: (as per https://alma.lobid.org/marcxml/99371014448006441)

<datafield tag="ITM" ind1=" " ind2=" ">
  <subfield code="H">22182509640007147</subfield>
  <subfield code="x">00009001</subfield>
  <subfield code="f">BOOK</subfield>
  <subfield code="v">00009001</subfield>
  <subfield code="p">04</subfield>
  <subfield code="X">System</subfield>
  <subfield code="Y">2022-06-28 02:00:00 Europe/Berlin</subfield>
  <subfield code="n">JAP/Ls3-7Ric5#a</subfield>
  <subfield code="M">49HBZ_BRIDGE_UBK</subfield>
  <subfield code="s">1</subfield>
  <subfield code="d">0</subfield>
  <subfield code="V">System</subfield>
  <subfield code="b">JAP/0020015</subfield>
  <subfield code="a">23182509600007147</subfield>
  <subfield code="c">JAP/Ls3-7Ric5#a</subfield>
  <subfield code="W">2023-02-23 19:52:29 Europe/Berlin</subfield>
  <subfield code="u">K1144</subfield>
  <subfield code="w">K1144</subfield>
</datafield>

where K1144 points to our decentral library. I have used https://gist.github.com/acka47/f27425c7ff058dab40739f485d2b2ba7#file-sigelliste-tsv-L577 as a reference for figuring out the link between K1144 and DE-38-459.

gregorbg commented 1 year ago

Note that this affects other decentral libraries in Cologne as well, for example DE-38-315 exhibits the same behaviour. loid.org yields 12576 results, whereas under alma.lobid.org the same query yields 0 results with all individual books listed as belonging to DE-38. The raw Alma MARC for a random book present in the library shows the information K1089 which points to DE-38-315.

acka47 commented 1 year ago

We have implemented some sublibrary code mappings in https://github.com/hbz/lobid-resources/issues/1639 in communication with hbz lirbaries via the FeX meeting. Apparently some are missing and @TobiasNx will know best how to add them.

TobiasNx commented 1 year ago

The info is in:

u | Permanent library subfield
w | Current library subfield 

We have a couple of problems coming together here:

1) Unfortunately ALMA does not have owner any more only sublibrary codes. (https://github.com/hbz/lobid-resources/issues/1605#issuecomment-1385382849). All holdings link to the main library in ALMA. The sublibrary codes state the current location. Are not uniq as the Sigl/ISIL and can only be mapped if the hosting main library provides a mapping for the sublibrary codes. Köln would need to provide a mapping for that so that we could introduce our transformation. We stated this in our API-Break list.

2) The UB Köln records/holdings are provided via bridge into ALMA (<subfield code="M">49HBZ_BRIDGE_UBK</subfield>), that means that your library is not migrated to ALMA yet and the metadata is provided via a a workaround called "bridge" (@blackwinter ) can better explain what that is all about.

3) It seems that the bridged libraries still use the owner instead of the sublibrary codes. Not sure if they keep it after the migration of Köln. Probably not.

gregorbg commented 1 year ago

2) The UB Köln records/holdings are provided via bridge into ALMA (<subfield code="M">49HBZ_BRIDGE_UBK</subfield>), that means that your library is not migrated to ALMA yet and the metadata is provided via a a workaround called "bridge" (@blackwinter ) can better explain what that is all about.

I know what the bridges are and what they are all about :D

(currently busy, will reply in detail later, just wanted to save you of typing a lengthy explanation about the A2A adapters)

gregorbg commented 1 year ago
  1. The sublibrary codes state the current location. Are not uniq as the Sigl/ISIL and can only be mapped if the hosting main library provides a mapping for the sublibrary codes. Köln would need to provide a mapping for that so that we could introduce our transformation.

As far as I can tell, there is a mapping list floating around as a GitHub Gist at https://gist.github.com/acka47/f27425c7ff058dab40739f485d2b2ba7 (which incidentally seems to be correct for the small handful of institution libraries that I know the ISIL of by heart). What is wrong with this list, and why is it not included in https://github.com/hbz/lookup-tables?

By no means do I want to sound pushy, but I simply want to understand the process. From my outside perspective, the data is there (Alma XML contains K1144 and K1144 can very clearly be mapped to DE-38-459) so I'm genuinely curious why this mapping isn't established in Lobid-Resources.

acka47 commented 1 year ago

For the record: We have a similar case with the catalog of Juristisches Seminar Bonn:

Thus, at least for ULB Bonn and USB Köln, we will have to add a temporary mapping of the owner IDs to the ISILs. With the switch to Alma, USB Köln and ULB Bonn will have to add sublibrary cods in Alma and a mapping of these codes to ISILs in https://github.com/hbz/lookup-tables/tree/master/data/almaSublibraryCode2Isil (see also the next comment by @TobiasNx ).

BTW, a more up to date version of the Sigelliste (also as csv, see attachments) can be found at https://service-wiki.hbz-nrw.de/pages/viewpage.action?pageId=457703487.

TobiasNx commented 1 year ago

Let me explain the old process in ALEPH and the new in ALMA with context of decision of the GOAL Migration:

http://lobid.org/hbz01/HT021117356

<datafield tag="088" ind1=" " ind2=" ">
<subfield code="a">38/459</subfield>
<subfield code="b">00009001</subfield>
<subfield code="c">JAP/Ls3-7Ric5#a</subfield>
<subfield code="e">keine ILL</subfield>
</datafield>

in $ a we had the info of the Sigel which were mapped beforehand from the uniq OWNER code in ALEPH which could be sublibraries. Lobid mapped the info of the sigl to the ISIL of your sublibrary. The mapping list you provided is OWNER to Sigel that seems to be the basis for the internal mapping of ALEPH and was never used in lobid. We only used this mapping: Sigel to ISIL: https://github.com/hbz/lobid-resources/blob/0.5.0/src/main/resources/sigel2isilMap.csv

The decision in context of the GOAL Migration from ALEPH to ALMA was to get rid of the OWNER and only list the main library as owner. This was no decision of the lobid team but of the ALMA Consortium in Wave 1 and 2. The sublibrary info would not have an owner any more but only an not uniq sublibrary code in $u or $w. They are only uniq in relation to the main library code $M. Also not all sublibraries have corresponding ISILs.

<datafield tag="ITM" ind1=" " ind2=" ">
  <subfield code="H">22182509640007147</subfield>
  <subfield code="x">00009001</subfield>
  <subfield code="f">BOOK</subfield>
  <subfield code="v">00009001</subfield>
  <subfield code="p">04</subfield>
  <subfield code="X">System</subfield>
  <subfield code="Y">2022-06-28 02:00:00 Europe/Berlin</subfield>
  <subfield code="n">JAP/Ls3-7Ric5#a</subfield>
  <subfield code="M">49HBZ_BRIDGE_UBK</subfield>
  <subfield code="s">1</subfield>
  <subfield code="d">0</subfield>
  <subfield code="V">System</subfield>
  <subfield code="b">JAP/0020015</subfield>
  <subfield code="a">23182509600007147</subfield>
  <subfield code="c">JAP/Ls3-7Ric5#a</subfield>
  <subfield code="W">2023-02-23 19:52:29 Europe/Berlin</subfield>
  <subfield code="u">K1144</subfield>
  <subfield code="w">K1144</subfield>
</datafield>

This is a info huge loss for lobid. To be able to link to the specific sublibrary ISIL now we are depending on a mapping of the sublibrary codes which need to be provided by the libraries.

In case of the bridges $u and $w the data seems to be the OWNER from the ALEPH data probably because you did not migrate yet. If the values in $u and $w are kept after the migration is unclear or unlikely due to the fact that ALMA has no owners. They will probably be replaced by sublibrary codes.

For a workaround we could use the Owner to Sigil and then the Sigil to ISIL mapping to map yout sublibrary. But this will not be a longterm solution for this problem. We need a mapping by your parent library of their sublibrary codes to isil to ensure your sublibrary to be listed after the migration of Köln or Bonn to ALMA.

gregorbg commented 1 year ago

Thank you so much for the detailed explanation! So this is a political decision by the Alma consortium after all. Would it be feasible/reasonable to use the existing Sigelliste at https://service-wiki.hbz-nrw.de/pages/viewpage.action?pageId=457703487 as a heuristic?

The reason why I'm so concerned about this is that I'd like to query our "Bestand" efficiently. The only way that I can see right now is to download the GZIPped JSONL for the entire DE-38 library and then manually filter on our side. This creates additional overhead as well as significant load on your servers because the entire Bestand for our Zentralbibliothek is huge.

TobiasNx commented 1 year ago

I will find out what I can do. But https://service-wiki.hbz-nrw.de/pages/viewpage.action?pageId=457703487 is only the list of deleted Sigils, but there seems to be a SQL Database here: https://service-wiki.hbz-nrw.de/display/VDBE/Liste+der+Sigel+und+Owner+der+hbz-Verbundbibliotheken?src=contextnavpagetreemode

I am talking to the Verbund-Gruppe, if I can reuse them as workaround for the Bridges

TobiasNx commented 1 year ago

I already have implemented this.

TobiasNx commented 1 year ago

This issue seems to be solved. Closing.