hbz / lobid

Linking Open Bibliographic Data
https://lobid.org/
Eclipse Public License 2.0
15 stars 4 forks source link

Mapping for reports #242

Closed jschnasse closed 7 years ago

jschnasse commented 8 years ago

Suggested rule: If 'r' is the second letter in MAB 51 --> type of resource is http://purl.org/ontology/bibo/Report

Example: http://lobid.org/resource/HT018297384/about?format=source

acka47 commented 8 years ago

Here's the MAB documentation for 051:

051     VEROEFFENTLICHUNGSSPEZIFISCHE ANGABEN ZU BEGRENZTEN
        WERKEN

...

          1-3  Veroeffentlichungsart und Inhalt
               a = Abstract (Referat)
               b = Bibliographie
               c = Katalog
               d = Woerterbuch
               e = Enzyklopaedie
               f = Festschrift
               g = Datenbank
               h = Biographie
               i = Registerwerk
               j = Fortschrittsbericht
               k = Konferenzschrift
               l = Gesetz
               m = Musikalia
               n = Normschrift
               o = Loseblattausgabe
               p = Patentdokument
               q = Lieferungswerk
               r = Report
               s = Statistik
               t = Aufsatz
               u = Universitaetsschrift
               v = Sonderdruck
               x = Schulbuch
               z = sonstige Veroeffentlichungsart/-inhalt

Thus, we should also map an r as third and fourth letter to http://purl.org/ontology/bibo/Report.

acka47 commented 8 years ago

We should also transform the "Formschlagwort" "Bericht" to this type as it is more fruitful than 051.

For examples see http://lobid.org/resource?subject=Bericht (The first results have "Bericht" in SubjectLabel which comes from 710, ind2="1", subfield a, e.g. http://lobid.org/resource/TT003280170 (snippet):

          <datafield ind2="1" ind1="-" tag="710">
            <subfield code="a">Bericht</subfield>
          </datafield>

or from 952 ind2="1", subfield x, e.g. http://lobid.org/resource/TT050409948 (snippet):

          <datafield ind2="1" ind1="-" tag="952">
            <subfield code="s">Operation</subfield>
            <subfield code="x">Bericht</subfield>
          </datafield>
aquast commented 8 years ago

Edit @dr0i: moved comment and created new issue, see #314.

dr0i commented 7 years ago

Deployed to production, see http://lobid.org/resource/HT018297384 and http://lobid.org/resources/HT018297384. Please close if pleased.

acka47 commented 7 years ago

Looks good for most resources. http://lobid.org/resource/HT003654516 and its superordinated resource aren't typed as report, though. Both have "Berichte" in 952 and GND ID 4128022-2 ("Bericht") in 902. (This obviously is the "Formschlagwort" version for "Bericht" with a GND ID.) :

<datafield tag="902" ind1="-" ind2="2">
  <subfield code="s">Bericht</subfield>
  <subfield code="9">(DE-588)4128022-2</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="2">
  <subfield code="s">Berichte</subfield>
</datafield>

Some have GND subject "Bericht" in 907, e.g. http://lobid.org/hbz01/HT013374914:

<datafield tag="907" ind1="-" ind2="1">
  <subfield code="s">Bericht</subfield>
  <subfield code="9">(DE-588)4128022-2</subfield>
</datafield>

Conclusion: We should also add 4128022-2 to the mapping.

dr0i commented 7 years ago

http://lobid.org/resource/HT002152208 is the superordinated resource of http://lobid.org/resource/HT003654516 . It's typed as Journal. You sure that it also should be typed as Report?

dr0i commented 7 years ago

Would be very helpful to know all the fields/subfields which determine the type Report. So far I gathered:

051. starts with .[r]|..[r]|...[r] 710-1.a starts with Bericht 9[0123456789][27]-[-1].[sx] starts with Bericht Anything else in the house? Maybe even 9[0123456789][27]-[-1].. and `710-1.. (<=> "whatever subfield starting with Bericht")?

dr0i commented 7 years ago
<datafield tag="902" ind1="-" ind2="2">

Shall I really take this into account? It means "If the superordinated resource is a Bericht, this subordinated is also Bericht ?

acka47 commented 7 years ago

http://lobid.org/resource/HT002152208 is the superordinated resource of http://lobid.org/resource/HT003654516 . It's typed as Journal. You sure that it also should be typed as Report?

In this case it definitely makes sense. "Journal" just means "Periodikum" and what we are dealing with is a periodical (annual) publication of a report (=Jahresbericht).

<datafield tag="902" ind1="-" ind2="2">

Shall I really take this into account? It means "If the superordinated resource is a Bericht, this subordinated is also Bericht ?

I guess it makes sense to type all resources with GND ID 4128022-2 as "Report" – superordinated as well as subordinated resources.

Would be very helpful to know all the fields/subfields which determine the type Report. So far I gathered:

Yes.

051. starts with .[r]|..[r]|...[r]

+1

710-1.a starts with Bericht

We better leave this one out as it is a very loosely controlled field (see https://github.com/hbz/lobid/issues/109#issuecomment-240360765).

9[0123456789][27]-[-1].[sx] starts with Bericht

I'd rather use the GND ID "4128022-2" for finding the type. 952 are the alternate names for the subject and thus can be omitted. This means using something like:

9[0][27]-[-1].[9] contains "4128022-2"

dr0i commented 7 years ago

9[0][27]-[-1].[9]

Bringing this in line with what you said =>: 9[01234][27]-[-12].9

And, but you are aware that the examples resources you gave in https://github.com/hbz/lobid/issues/242#issuecomment-154032054 (TT003280170 and TT050409948 ) thus both won't be typed as Report, do you?

dr0i commented 7 years ago

Changes will be deployed to staging on sunday and to production on monday. Please review then.

acka47 commented 7 years ago

but you are aware that the examples resources you gave in https://github.com/hbz/lobid/issues/242#issuecomment-154032054 (TT003280170 and TT050409948 ) thus both won't be typed as Report, do you?

I am. And I think it is the right way to go. Regarding TT003280170, I already said in https://github.com/hbz/lobid/issues/242#issuecomment-240364322 why we shouldn't take into account 710. Regarding TT050409948, the subject heading is actually "Operationsbericht" which has as superordinated concept " Krankenunterlagen" and not "Bericht". Thus, it makes sense to me to not include it.

acka47 commented 7 years ago

Looks good. Closing.