hbz / lobid-resources

Transformation, web frontend, and API for the hbz catalog as LOD
http://lobid.org/resources
Eclipse Public License 2.0
7 stars 7 forks source link

Map field 064/Marc 655 (natureOfContent), `689??.$f` to existing types/medium #592

Open acka47 opened 6 years ago

acka47 commented 6 years ago

This is to enable a homogenous way of filtering the data by type. (NWBib editors said they could provide us with a mapping.)

E.g. all resources with type "Miscellaneous" and natureOfContent.id "http://d-nb.info/gnd/4329497-2" (Hörbuch) can get the type "Book".

ChristophEwertowski commented 6 years ago

I will only list examples where no type / medium exists but a natureOfContent statement is there. At least 10 results have to be found this way. Also I won't go through all found GND-Ids in MAB 064 but end it, if I can't get at least 10 results for five GND-Ids in a row.

Other natureOfContent which have at least 10 results but can't be given a definitive (existing or not yet existing) type:

No fitting types but a few hundred resources:

The regular expression for natureOfContent.id should be adjusted so that also GND-Ids without "(DE-588)" are covered. For example, for "Amtliche Publikation" there are 1201 MAB fields without it but only 564 with the prefix.


For medium (shouldn't be put into natureOfContent but could be used):

acka47 commented 6 years ago

Thanks for the overview. Re. bibliography, we actually have a class in lobid vocab: http://purl.org/lobid/lv#Bibliography

TobiasNx commented 2 years ago

Should this be taken up in @acka47 in ALMA-Fix?

TobiasNx commented 1 year ago

@dr0i could we make the natureOfContent.id and label subfield aggregatable? It would help to identify pattern and reuse them for facette matching?

dr0i commented 1 year ago

Could be possible without any manipulation of natureOfContent.label directly with some kind of metric aggregations in elasticsearch - didn't wokred that out, though. (I don't like in general to enable aggregations in fields, because of keeping the index as simple as possible. If this aggregations would be enabled in GUI I would do it, but not for doing a once-time analysis). However, as we have natureOfContent.id aggrgations enabled we just can use that data an make some lookups. This should be doable with metafacture - but this shell script also works:

curl -XGET  'http://weywot3.hbz-nrw.de:9200/resources/_search?q=natureOfContent.id:*&pretty=true' -d '
{
"size": 0,
  "aggs": {
    "aggs1": {
       "terms": {
          "field": "natureOfContent.id",
           "size": 100
        }
    }
  }
}
' >> aggsLabel.json

rm $idAndCount.tsv
jq -r '["key", "doc_count"], (.aggregations[].buckets[] | [.key, .doc_count])| @csv' aggsLabel.json | tr -d '"' > idAndCount.tsv

rm gndAndLabels.csv

for idCount in $(cat idAndCount.tsv); do
echo "idCount ist:$idCount"
        id=$(echo "$idCount" | cut -d ',' -f1 )
        count=$(echo "$idCount" | cut -d ',' -f2 )
echo "id ist: $id"
        idRdfUrl=$id/about/lds ; 
        label=$(curl -L "$idRdfUrl" |grep gndo:preferredNameForTheSubjectHeading | cut -d '"' -f2) ;
         echo "$count,$label,$id" >> gndAndLabels.csv;
done

cat gndAndLabels.csv|sort -g

Kind of complex. However, find the outcome of the script at https://gist.github.com/dr0i/17f284e439e3750596e0101c2505ac87.

TobiasNx commented 1 year ago

I have the id as a list, but I need the labels since not all labels have ids. Therefore I would need an aggregation for natureOfContent.label. This would be temporary to extract a list and could be stoped then.

dr0i commented 1 year ago

I see. Enabled aggregations on natureOfContent.label , ran this:

curl ...._search?q=*&pretty=true" -d '
{
  "size": 0,
  "aggs": {
          "aggs1": {
              "terms": {
                "field": "natureOfContent.label",
                "size": 100
              }
          }
        }
}'

got this:

 ...
   "aggregations" : {
    "aggs1" : {
      "doc_count_error_upper_bound" : 905,
      "sum_other_doc_count" : 206890,
      "buckets" : [
        {
          "key" : "Zeitschrift",
          "doc_count" : 1060256
        },
        {
          "key" : "Aufsatzsammlung",
          "doc_count" : 462305
        },
        {
          "key" : "Konferenzschrift",
          "doc_count" : 300850
        },
        {
          "key" : "Monografische Reihe",
          "doc_count" : 194971
        },
        {
          "key" : "Hochschulschrift",
          "doc_count" : 178580
        },
        {
          "key" : "Bildband",
          "doc_count" : 104299
        },
        {
          "key" : "Ausstellungskatalog",
          "doc_count" : 99843
        },
        {
          "key" : "Quelle",
          "doc_count" : 79261
        },
        {
          "key" : "Biografie",
          "doc_count" : 74869
        },
        {
          "key" : "Statistik",
          "doc_count" : 69858
        },
        {
          "key" : "Periodicals.",
          "doc_count" : 66722
        },
        {
          "key" : "Fiktionale Darstellung",
          "doc_count" : 65922
        },
        {
          "key" : "Einführung",
          "doc_count" : 65682
        },
        {
          "key" : "Bibliografie",
          "doc_count" : 63838
        },
        {
          "key" : "Führer",
          "doc_count" : 62780
        },
        {
          "key" : "Ratgeber",
          "doc_count" : 55196
        },
        {
          "key" : "Lehrbuch",
          "doc_count" : 54820
        },
        {
          "key" : "Wörterbuch",
          "doc_count" : 52832
        },
        {
          "key" : "Karte",
          "doc_count" : 46644
        },
        {
          "key" : "Zeitung",
          "doc_count" : 45407
        },
        {
          "key" : "Anthologie",
          "doc_count" : 42001
        },
        {
          "key" : "Verzeichnis",
          "doc_count" : 41236
        },
        {
          "key" : "Online-Ressource",
          "doc_count" : 38223
        },
        {
          "key" : "Erlebnisbericht",
          "doc_count" : 35441
        },
        {
          "key" : "Lehrmittel",
          "doc_count" : 31836
        },
        {
          "key" : "CD-ROM",
          "doc_count" : 26371
        },
        {
          "key" : "Kommentar",
          "doc_count" : 26320
        },
        {
          "key" : "Aufgabensammlung",
          "doc_count" : 22977
        },
        {
          "key" : "Autobiografie",
          "doc_count" : 21387
        },
        {
          "key" : "CD",
          "doc_count" : 19934
        },
        {
          "key" : "Schulbuch",
          "doc_count" : 19655
        },
        {
          "key" : "Briefsammlung",
          "doc_count" : 17321
        },
        {
          "key" : "Amtliche Publikation",
          "doc_count" : 17083
        },
        {
          "key" : "Scores.",
          "doc_count" : 16265
        },
        {
          "key" : "Schulprogramm",
          "doc_count" : 16221
        },
        {
          "key" : "Beispielsammlung",
          "doc_count" : 15067
        },
        {
          "key" : "Bericht",
          "doc_count" : 14342
        },
        {
          "key" : "Festschrift",
          "doc_count" : 13167
        },
        {
          "key" : "Kinderbuch",
          "doc_count" : 13156
        },
        {
          "key" : "Katalog",
          "doc_count" : 12912
        },
        {
          "key" : "DVD-Video",
          "doc_count" : 12872
        },
        {
          "key" : "Anleitung",
          "doc_count" : 12192
        },
        {
          "key" : "Interview",
          "doc_count" : 11879
        },
        {
          "key" : "Bilderbuch",
          "doc_count" : 10582
        },
        {
          "key" : "Education films.",
          "doc_count" : 10032
        },
        {
          "key" : "Online-Publikation",
          "doc_count" : 9895
        },
        {
          "key" : "Adressbuch",
          "doc_count" : 9002
        },
        {
          "key" : "Reisebericht",
          "doc_count" : 8938
        },
        {
          "key" : "Jugendbuch",
          "doc_count" : 8488
        },
        {
          "key" : "Comic",
          "doc_count" : 8278
        },
        {
          "key" : "Regionalzeitung",
          "doc_count" : 8169
        },
        {
          "key" : "Kunstführer",
          "doc_count" : 8122
        },
        {
          "key" : "Kochbuch",
          "doc_count" : 8020
        },
        {
          "key" : "Datenbank",
          "doc_count" : 7398
        },
        {
          "key" : "History.",
          "doc_count" : 7327
        },
        {
          "key" : "Lokalpresse",
          "doc_count" : 7127
        },
        {
          "key" : "Atlas",
          "doc_count" : 7054
        },
        {
          "key" : "Fallstudiensammlung",
          "doc_count" : 6994
        },
        {
          "key" : "Werkverzeichnis",
          "doc_count" : 6874
        },
        {
          "key" : "Kindersachbuch",
          "doc_count" : 6867
        },
        {
          "key" : "Chamber music.",
          "doc_count" : 6686
        },
        {
          "key" : "Altkarte",
          "doc_count" : 6561
        },
        {
          "key" : "Documentary films.",
          "doc_count" : 6223
        },
        {
          "key" : "Sacred music.",
          "doc_count" : 6000
        },
        {
          "key" : "Tagebuch",
          "doc_count" : 5740
        },
        {
          "key" : "Übungssammlung",
          "doc_count" : 5590
        },
        {
          "key" : "Hörbuch",
          "doc_count" : 5470
        },
        {
          "key" : "Film",
          "doc_count" : 5454
        },
        {
          "key" : "Website",
          "doc_count" : 5381
        },
        {
          "key" : "Bibliographie",
          "doc_count" : 5258
        },
        {
          "key" : "Bestimmungsbuch",
          "doc_count" : 5025
        },
        {
          "key" : "Stadtplan",
          "doc_count" : 4826
        },
        {
          "key" : "Programmheft",
          "doc_count" : 4814
        },
        {
          "key" : "Forschungsbericht",
          "doc_count" : 4691
        },
        {
          "key" : "Criticism, interpretation, etc.",
          "doc_count" : 4613
        },
        {
          "key" : "Fallsammlung",
          "doc_count" : 4558
        },
        {
          "key" : "Unterrichtseinheit",
          "doc_count" : 4260
        },
        {
          "key" : "Mehrsprachiges Wörterbuch",
          "doc_count" : 4103
        },
        {
          "key" : "Datensammlung",
          "doc_count" : 4049
        },
        {
          "key" : "Richtlinie",
          "doc_count" : 3990
        },
        {
          "key" : "Mikroform",
          "doc_count" : 3761
        },
        {
          "key" : "Anzeigenblatt",
          "doc_count" : 3742
        },
        {
          "key" : "Theaterstück",
          "doc_count" : 3734
        },
        {
          "key" : "Formularsammlung",
          "doc_count" : 3673
        },
        {
          "key" : "Haushaltsplan",
          "doc_count" : 3603
        },
        {
          "key" : "Werkzeitschrift",
          "doc_count" : 3524
        },
        {
          "key" : "Kongress",
          "doc_count" : 3281
        },
        {
          "key" : "Loseblattsammlung",
          "doc_count" : 3184
        },
        {
          "key" : "Inventar",
          "doc_count" : 3147
        },
        {
          "key" : "Humoristische Darstellung",
          "doc_count" : 2907
        },
        {
          "key" : "Entscheidungssammlung",
          "doc_count" : 2829
        },
        {
          "key" : "Umfrage",
          "doc_count" : 2780
        },
        {
          "key" : "DVD-ROM",
          "doc_count" : 2738
        },
        {
          "key" : "Kalender",
          "doc_count" : 2733
        },
        {
          "key" : "Zeittafel",
          "doc_count" : 2730
        },
        {
          "key" : "Songs.",
          "doc_count" : 2683
        },
        {
          "key" : "Motets.",
          "doc_count" : 2677
        },
        {
          "key" : "Lehrerhandbuch",
          "doc_count" : 2647
        },
        {
          "key" : "Conference papers and proceedings.",
          "doc_count" : 2603
        },
        {
          "key" : "Feature films.",
          "doc_count" : 2554
        }
      ]
TobiasNx commented 1 year ago

Thanks, i checked how many records have natureOfContent at all: "only" 3693643

dr0i commented 1 year ago

Chenaged the index settings to take the string as a whole as key, so spaces are now part of it. Updated the result in the comment above.

TobiasNx commented 3 months ago

This ticket is a duplicate for #1549 The relevant comment is quoted here. 689??.fand 655 should be included type and medium here.

Formangaben :

* [x]  Adressbuch

* [x]  Altkarte

* [ ]  Amtliche Publikation

* [ ]  Anleitung

* [ ]  Anthologie

* [ ]  Antiquariatskatalog

* [ ]  Anzeigenblatt

* [x]  Atlas

* [ ]  Aufgabensammlung

* [x]  Aufsatzsammlung

* [x]  Auktionskatalog

* [x]  Ausstellungskatalog

* [x]  Autobiografie (alte Rechtschreibung: Autobiographie) [Hinweis kann Zeitangaben haben]

* [ ]  Autograf

* [x]  Backbuch

* [ ]  Beispielsammlung

* [x]  Bericht

* [x]  Bestimmungsbuch

* [x]  Bibliografie (alte Rechtschreibung:  Bibliographie) [Hinweis kann Zeitangaben haben]

* [x]  Bild

* [x]  Bildband

* [ ]  Bilderbogen

* [x]  Bilderbuch

* [ ]  Bildnis

* [ ]  Bildwörterbuch

* [ ]  Biografie (alte Rechtschreibung: Biographie) [Hinweis kann Zeitangaben haben]

* [x]  Blindendruck

* [ ]  Briefsammlung [Hinweis kann Zeitangaben haben]

* [ ]  Checkliste

* [ ]  Comic

* [ ]  Datenbank

* [ ]  Datensammlung

* [ ]  Diagramm

* [ ]  Diskografie

* [x]  Drehbuch

* [ ]  Einblattdruck

* [ ]  Einführung

* [ ]  Entscheidungssammlung

* [x]  Enzyklopädie

* [x]  Erlebnisbericht

* [ ]  Fachkunde

* [ ]  Fahrplan

* [ ]  Faksimile

* [ ]  Fallsammlung

* [ ]  Fallstudiensammlung

* [x]  Festschrift

* [ ]  Fiktionale Darstellung

* [ ]  Film

* [ ]  Filmografie

* [ ]  Flugblatt

* [ ]  Flugschrift

* [ ]  Formelsammlung

* [ ]  Formularsammlung

* [x]  Forschungsbericht

* [ ]  Forschungsdaten

* [ ]  Fotografie (alte Rechtschreibung: Photographie)

* [ ]  Führer

* [ ]  Fundstellenverzeichnis

* [ ]  Genealogische Tafel

* [ ]  Gespräch

* [ ]  Globus

* [ ]  Grafik (alte Rechtschreibung: Graphik)

* [ ]  Graphzine

* [ ]  Handschrift

* [ ]  Haushaltsplan

* [ ]  Hochschulschrift

* [x]  Hörbuch

* [ ]  Hörspiel

* [ ]  Humoristische Darstellung

* [ ]  Inkunabel

* [ ]  Interview

* [ ]  Inventar

* [ ]  Jugendbuch

* [ ]  Jugendsachbuch

* [ ]  Kalender

* [ ]  Karikatur

* [ ]  Karte

* [ ]  Katalog [Hinweis kann Zeitangaben haben]

* [ ]  Kinderbuch

* [ ]  Kindersachbuch

* [ ]  Kochbuch

* [ ]  Kolumnensammlung

* [ ]  Kommentar

* [ ]  Konferenzschrift

* [ ]  Konkordanz

* [ ]  Kunstführer

* [ ]  Künstlerbuch

* [ ]  Laudatio

* [ ]  Lehrbuch

* [ ]  Lehrerhandbuch

* [ ]  Lehrmittel

* [ ]  Lehrplan

* [ ]  Lernsoftware

* [ ]  Lesebuch

* [ ]  Liederbuch

* [x]  Literaturbericht [Hinweis kann Zeitangaben haben]

* [ ]  Loseblattsammlung

* [x]  Mehrsprachiges Wörterbuch (alte Version: Wörterbuch )

* [ ]  Mitgliederverzeichnis

* [ ]  Modell

* [ ]  Monografische Reihe

* [ ]  Musikhandschrift

* [ ]  Nachruf

* [x]  Norm

* [ ]  Ortsverzeichnis

* [ ]  Papyrus

* [ ]  Patentschrift

* [ ]  Plakat

* [ ]  Plan

* [ ]  Postkarte

* [ ]  Praktikum

* [ ]  Predigthilfe

* [ ]  Pressendruck

* [ ]  Pressestimme

* [ ]  Programmheft

* [ ]  Puzzle

* [ ]  Quelle

* [ ]  Ratgeber

* [ ]  Rede

* [ ]  Referateorgan

* [ ]  Regest

* [x]  Reisebericht [Hinweis kann Zeitangaben haben]

* [ ]  Reportagensammlung

* [ ]  Rezension

* [x]  Richtlinie

* [ ]  Röntgenbild

* [x]  Rückläufiges Wörterbuch

* [ ]  Sachbilderbuch

* [ ]  Satzung

* [ ]  Schematismus

* [x]  Schulbuch

* [ ]  Schulprogramm

* [ ]  Software

* [ ]  Spiel

* [ ]  Sprachatlas

* [ ]  Sprachführer

* [x]  Stadtplan

* [ ]  Statistik [Hinweis kann Zeitangaben haben]

* [ ]  Tabelle

* [ ]  Tafel

* [ ]  Tagebuch [Hinweis kann Zeitangaben haben]

* [ ]  Technische Zeichnung

* [ ]  Telefonbuch

* [ ]  Testmaterial

* [ ]  Theaterstück

* [ ]  Thesaurus

* [ ]  Übungssammlung

* [ ]  Umfrage

* [ ]  Unterrichtseinheit

* [ ]  Urkunde

* [ ]  Verkaufskatalog

* [ ]  Verzeichnis

* [ ]  Vorlesungsverzeichnis

* [ ]  Weblog

* [x]  Website

* [ ]  Weltkarte

* [x]  Werkverzeichnis [Hinweis kann Zeitangaben haben]

* [ ]  Werkzeitschrift

* [ ]  Wörterbuch

* [ ]  Zeichnung

* [x]  Zeitschrift

* [ ]  Zeittafel

* [x]  Zeitung

* [ ]  Zitatensammlung

RWSK RAK old Formschlagwörter that have no matching values in Formangaben RDA: These seem to be carrier types.

* [ ]  Audiovisuelles Material

* [ ]  Audiovisuelles Material <für Kinder>

* [ ]  Bildplatte

* [ ]  CD

* [ ]  CD <für Kinder>

* [ ]  CD-ROM

* [ ]  CD-ROM <für Kinder>

* [ ]  Dia

* [ ]  Diskette

* [ ]  Diskette <für Kinder>

* [ ]  DVD-Audio

* [ ]  DVD-Audio <für Kinder>

* [ ]  DVD-ROM

* [ ]  DVD-ROM <für Kinder>

* [ ]  DVD-Video

* [ ]  DVD-Video <für Kinder>

* [ ]  Elektronische Publikation

* [ ]  Elektronische Publikation <für Kinder>

* [ ]  Film <für Kinder>

* [ ]  Film 8mm (Schreibvariante: Film 8 mm)

* [ ]  Film Super-8

* [ ]  Film 16mm (Variante: Film 16 mm)

* [ ]  Film 35mm (Variante: Film 35 mm)

* [ ]  Film 65mm (Variante: Film 65 mm)

* [ ]  Film 70mm (Variante: Film 70 mm)

* [x]  Medienkombination

* [x]  Mikroform

* [x]  Musikdruck

* [ ]  Online-Publikation

* [ ]  Schallplatte

* [ ]  Schallplatte <für Kinder>

* [ ]  Text

* [ ]  Tonbildreihe

* [ ]  Tonkassette

* [ ]  Tonkassette <für Kinder>

* [ ]  Tonträger

* [ ]  Tonträger <für Kinder>

* [ ]  Videokassette

* [ ]  Videokassette <für Kinder>

Additional Formschlagwörter from: service-wiki.hbz-nrw.de/display/SEM/RSWK+Formschlagwoerter

* [ ]  Arbeitstransparent

* [ ]  Ausstellung

* [ ]  Belletristische Darstellung

* [ ]  Bildliche Darstellung

* [ ]  Flugblatt

* [x]  Kongress

* [ ]  Lernprogramm

* [ ]  Neuerwerbungsliste  [Hinweis kann Zeitangaben haben]

* [ ]  Programm

* [x]  Schriftenreihe

* [ ]  Telefaxverzeichnis

* [ ]  Telexverzeichnis

We should exclude carrier-types from natureOfContent (wiki.dnb.de/pages/viewpage.action?pageId=106039270).