hbz / lobid

Linking Open Bibliographic Data
https://lobid.org/
Eclipse Public License 2.0
15 stars 4 forks source link

Please expose all of Field MAB 078r in lobid data #430

Closed aquast closed 1 year ago

aquast commented 3 years ago

For FRL we need to work with content of field MAB 078r. Recently the content of the Field is not exposed or at least the content of the part "collectionOne" is not exposed. The latter is requuired to identify Leibniz organisations as contributor to Leibniz Open.

Best, Andres

acka47 commented 3 years ago

@aquast We will need some example records to work with. In one of your mails wrt to this issue you pointed to https://jira.hbz-nrw.de/browse/FRL-325 where I only find one example:

HT018907381 (XML):

<datafield tag="078" ind1="r" ind2="1">
<subfield code="a">38 M: ellinet; GND: (DE-588)1049797671</subfield>
</datafield>

Do you have some more examples that refer to other collections? To add this to lobid-resources, I will need somemore information about the use case, especially:

Is this about linking a resource to a collection or to an organisation?

in the first case, we would use inCollection like we do e.g. for indicating something is part of ZDB, NWBib, Edoweb etc., e.g.

{
   "id":"http://lobid.org/hbz01/HT018907381",
   "inCollection":{
      "id":"http://example.org/38M-collection",
      "label":"Ellinet"
   }
}

In the second case, could it suffice to know which organization created the record in the hbz union catalogue? We have this information in describedBy.sourceOrganization.id (which is coming from 070), in the example from above it looks like this (snippet):

{
   "@context":"http://lobid.org/resources/context.jsonld",
   "id":"http://lobid.org/resources/HT018907381#!",
   "describedBy":{
      "id":"http://lobid.org/resources/HT018907381",
      "type":[
         "BibliographicDescription"
      ],
      "modifiedBy":{
         "id":"http://lobid.org/organisations/DE-38M#!",
         "label":"lobid Organisation"
      },
      "dateCreated":"20160307",
      "dateModified":"20180620",
      "inDataset":{
         "id":"http://lobid.org/resources/dataset#!",
         "label":"lobid-resources – Der hbz-Verbundkatalog als Linked Open Data"
      },
      "sourceOrganization":{
         "id":"http://lobid.org/organisations/DE-38M#!",
         "label":"lobid Organisation"
      }
   }
}
aquast commented 3 years ago

In HT018907266

there is an additional information in MAB 078r in the common catalogue: 078r |a 38 M: ellinet ; GND: (DE-588)10025895-5 |a 38 M: ellinet ; collectionOne@id: "http://d-nb.info/gnd/10025895-5"

collectionOne@id: "http://d-nb.info/gnd/10025895-5" is the information required. My humble assumption is, that we get only the first field of MAB 078r ?

acka47 commented 3 years ago

In HT018907266

there is an additional information in MAB 078r in the common catalogue: 078r |a 38 M: ellinet ; GND: (DE-588)10025895-5 |a 38 M: ellinet ; collectionOne@id: "http://d-nb.info/gnd/10025895-5"

collectionOne@id: "http://d-nb.info/gnd/10025895-5" is the information required. My humble assumption is, that we get only the first field of MAB 078r ?

If you mean by that, that the needed data isn't exported from hbz01 than I assume you areright.

I don't see the information you are talking about in the Aleph OPAC:

grafik

In the XML export it looks the same:

<datafield tag="078" ind1="r" ind2="1">
   <subfield code="a">38 M: ellinet; GND: (DE-588)1049797671</subfield>
</datafield>
<datafield tag="078" ind1="r" ind2="2">
   <subfield code="a">38 M: ellinet; GND: (DE-588)1049797671</subfield>
</datafield>

I don't have access to the Aleph client to check whether the data can be seen there. Anyway, without the input data we won't be able to add it to lobid-resources.

aquast commented 3 years ago

I'm not sure what happend with my example. Can you please look at: HT020560724

2020-12-04 11 38 05 193 30 112 134 1d3e6ac3bf8d

acka47 commented 3 years ago

Now we are getting somewhere.

From the source data of HT020560724:

<datafield tag="078" ind1="r" ind2="1">
   <subfield code="a">38 M: ellinet ; GND: (DE-588)10025895-5</subfield>
   <subfield code="a">
38 M: ellinet ; collectionOne@id: "http://d-nb.info/gnd/10025895-5"
   </subfield>
</datafield>

Although 078r is not transformed to lobid RDF, 200 is transformed where 10025895-5 is recorded as "Herausgebendes Organ" which is added to the contribution array in lobid-resources like this:

    {
      "type": [
        "Contribution"
      ],
      "agent": {
        "id": "https://d-nb.info/gnd/10025895-5",
        "type": [
          "CorporateBody"
        ],
        "label": "Leibniz-Institut für Gewässerökologie und Binnenfischerei",
        "altLabel": [
          "Forschungsverbund Berlin. Leibniz-Institut für Gewässerökologie und Binnenfischerei",
          "Wissenschaftsgemeinschaft Gottfried Wilhelm Leibniz. Leibniz-Institut für Gewässerökologie und Binnenfischerei",
          "IGB",
          "Leibniz-Institut für Gewässerökologie und Binnenfischerei (IGB) im Forschungsverbund Berlin e.V.",
          "Leibniz-Institute of Freshwater Ecology and Inland Fisheries (IGB) Forschungsverbund Berlin e.V.",
          "Forschungsverbund Berlin. Leibniz Institute of Freshwater Ecology and Inland Fisheries",
          "Wissenschaftsgemeinschaft Gottfried Wilhelm Leibniz. Leibniz Institute of Freshwater Ecology and Inland Fisheries",
          "Leibniz Institute of Freshwater Ecology and Inland Fisheries"
        ],
        "gndIdentifier": "10025895-5"
      },
      "role": {
        "id": "http://id.loc.gov/vocabulary/relators/isb",
        "label": "Herausgeber/in"
      }
    }

If this is a genreal pattern (redundancy between 078r and 200) we might not need to transform 078r. Furthermore, similar information can be found in responsibilityStatement and publication.publishedBy, see the JSON-LD.

aquast commented 3 years ago

Hi, the very important difference is, that ZB MED sets a flag with MAB 078r "CollectionOne" to acknowledge that this is a "Herausgebendes Organ" from the Leibniz Gemeinschaft.

The latter we need to have for assigning records to Open Leibniz See example from an article in FRL: http://www.leibnizopen.de/suche/handle/document/129931

If there is another way to mark Leibniz Institutions I would be very happy.

Unfortunately there is no common flag within the GND I do know :-(

acka47 commented 3 years ago

the very important difference is, that ZB MED sets a flag with MAB 078r "CollectionOne" to acknowledge that this is a "Herausgebendes Organ" from the Leibniz Gemeinschaft.

So, would it be sufficient if we added something like the following to every resource that has 078r containing the string "collectionOne"?

{
   "id":"http://lobid.org/hbz01/HT020560724",
   "inCollection":{
      "id":"http://www.leibnizopen.de",
      "label":"LeibnizOpen"
   }
}
aquast commented 3 years ago

Jan created that for the FRL-articles:

"collectionOne": [ { "@id": "http://d-nb.info/gnd/10154023-1", "prefLabel": "Leibniz-Zentrum für Agrarlandschaftsforschung" }, { "@id": "http://d-nb.info/gnd/10025895-5", "prefLabel": "Leibniz-Institut für Gewässerökologie und Binnenfischerei" } ],

in my opinion, if your example provides the gnd-URI given in "MAB 078r collectionOne" in "id:" instead of "http://www.leibnizopen.de" it solves the issue :-)

acka47 commented 3 years ago

in my opinion, if your example provides the gnd-URI given in "MAB 078r collectionOne" in "id:" instead of "http://www.leibnizopen.de" it solves the issue :-)

Ok, this would be great as no extension to our data model would be needed. Bear in mind, though, that inCollection would not be exclusive to LeibnizOpen resources (see also the API documentation at https://hyp.is/0HcjSG1GEeeXDV9R428hIw/lobid.org/resources/api) and that, thus, you would have to maintain a list of URIs in inCollection.id to filter out all the resources in LeibnizOpen. Please confirm that this is ok and I will move this to "ready".

aquast commented 3 years ago

Yes, your right about the inCollection not be exclusive for Leibniz. For that reason we need to have

id: GND-URI label: "http://www.leibnizopen.de" at this point. The inCollection.id then can be handled by us. Is that according to your proposal?

acka47 commented 3 years ago

I understand your proposal like this:

{
   "id":"http://lobid.org/hbz01/HT020560724",
   "inCollection":{
      "id":"http://d-nb.info/gnd/10025895-5",
      "label":"http://www.leibnizopen.de/"
   }
}

As the label of the JSON object is always that of the resource identified by id we can not do this. It would have to look very similar to what Jan implemented for FRL articles:

{
   "id":"http://lobid.org/hbz01/HT020560724",
   "inCollection":{
      "id":"http://d-nb.info/gnd/10025895-5",
      "label":"Leibniz-Institut für Gewässerökologie und Binnenfischere"
   }
}

If this would work for you, we are set otherwise we'd have to think about another solution.

aquast commented 3 years ago

As I tried to say, we need a flag, if any of the inCollection organisations is

Recently, we use a rather fragile workaround for that. Is there any good reason to hide MAB 078r within the lobid data?

acka47 commented 3 years ago

Is there any good reason to hide MAB 078r within the lobid data?

No, there isn't, otherwise I wouldn't be discussing with you how to implement this. Generally, we are not adding a new field to our data model for every use case otherwise we would end up with something as unwieldly as the MAB source data. That is why we try to address each new use case within the possibilities of the existing data model. If this doesn't work out, we start thinking about another approach that extends the existing data model in a consistent way. As far as I can see, we have not yet exhausted the possibilites of the lobid data model.

You say that you need the following:

a flag if any of the inCollection organisations is

  • part of the Leibniz Gemeinschaft
  • is mentioned in MAB 078r,

I asked whether you would be willing to maintain within Publisso a list of organisations that are part of Leibniz Gemeinschaft (it can not be so many of them and probably the list doesn't change very often). We would then add data from 078r to inCollection like I exemplified in https://github.com/hbz/lobid/issues/430#issuecomment-738823826. As a result, you could easily add a flag "Leibniz Open" in Publisso and also indicate the concrete Leibniz Organization the resource stems from. As I understand you, it is not feasible for you to keep such a list and you rather want all the information you need from lobid-resources.

Thus, I propose another solution where we add

  1. an inCollection statement with the organization from GND as object
  2. add a hierarchicalSuperiorOfTheCorporateBody (that's a a GND Ontology property, see http://lobid.org/gnd/10025895-5.json) to indicate that the organization is part of Leibniz.
{
   "id":"http://lobid.org/hbz01/HT020560724",
   "id":"http://lobid.org/hbz01/HT020560724",
   "inCollection":[
      {
         "id":"http://d-nb.info/gnd/10025895-5",
         "label":"Leibniz-Institut für Gewässerökologie und Binnenfischerei",
         "hierarchicalSuperiorOfTheCorporateBody":{
            "id":"https://d-nb.info/gnd/5271371-4",
            "label":"Wissenschaftsgemeinschaft Gottfried Wilhelm Leibniz"
         }
      }
   ]
}

I think this should be enough. The indicator in MAB to create this statement would be the one I already mentioned above:

every resource that has 078r containing the string "collectionOne"

Ok?

acka47 commented 1 year ago

Due to inactivity, I think this can be closed, @aquast ?

acka47 commented 1 year ago

Closing