Open acka47 opened 6 years ago
I will only list examples where no type / medium exists but a natureOfContent
statement is there. At least 10 results have to be found this way. Also I won't go through all found GND-Ids in MAB 064 but end it, if I can't get at least 10 results for five GND-Ids in a row.
Hochschulschrift
(35,727 with label, 35,675 with GND-Id, 52 less). GND-Id: (DE-588)4113937-9. Example: HT019387176. Query: 7 results. Update 27th March: 61 results.PublishedScore
Aufführungsmaterial
. GND-Id: 1071243047. No hits! Could be up to 429. Update: 4 resultsKlavierauszug
. GND-Id: 4164058-5. Update: 3001 resultsPartitur
(18,708 with label, 18,694 with GND-Id, 14 less). GND-Id: (DE-588)4173447-6. Update: 25.000 resultsStudienpartitur
. GND-Id: 1071332406. No hits! Could be up to 894. Update: ~1200 resultsAufsatzsammlung
(13,664 with label, 13,617 with GND-Id, 47 less). GND-Id: (DE-588)4143413-4. Example: HT019250171. Query: 36 results. -> EditedVolume
Update: ~21.000 resultsKonferenzschrift
(8,911 with label, 8,885 without GND-Id, 26 less). GND-Id: (DE-588)1071861417. Example: HT019170450. Query: 25 results. -> Proceedings
. Update: [http://lobid.org/resources/search?q=natureOfContent.id%3A%22http%3A%2F%2Fd-nb.info%2Fgnd%2F1071861417%22)Stimme, Musikalische Ausgabeform
(4,606 with label, 3 less with GND-Id). GND-Id: (DE-588)1071380443. No resource with this natureOfContent.id
in lobid-resources!Biography
-> http://purl.org/lobid/lv#Biography
Biografie
(3,422 with label, 3,418 with GND-Id). GND-Id: (DE-588)4006804-3. Example: http://lobid.org/resources/HT018945478. Query: 23 results. -> http://purl.org/lobid/lv#BiographyAutobiografie
. GND-Id: 4003939-0. Query: 20 results.Book
Ausstellungskatalog
(6,745 with label, 6,679 without GND-Id, 66 less). GND-Id: (DE-588)4135467-9. Example: HT018902758. Query: 12 results.Anthologie
. GND-Id: 4002214-6. Query: 65 results. -> Book
Bildband
(6,149 with label, 6,110 without GND-Id). GND-Id: (DE-588)4145395-5. Example: HT019076184. Query: 38 results.Einführung
. GND-Id: 4151278-9. Query: 17 results. -> Book
. Führer
. GND-Id: 4155569-7. Query: 15 results. -> Book
and ReferenceSource
?Hörbuch
. GND-Id: 4329497-2. Query: 1010 results.Jugendbuch
. GND-Id: 4306252-0. Query: 17 results.Kinderbuch
. GND-Id: 4303251-5. Query: 55 results. -> Book
Kommentar
. GND-Id: 4136710-8. Query: 68 results. -> Book
Lehrbuch
(3,300 with label, 3,294with GND-Id). Example: http://lobid.org/resources/HT019501695. Query: 20 results. -> Book
Lehrerhandbuch
. GND-Id: 4167168-5. Query: 10 results. -> Book
Monografische Reihe
(15,033 with label, 15032 with GND-Id). GND-Id: (DE-588)4179998-7. Example: HT018940006. Query: 46 results. Could be typed as Book
.Ratgeber
. GND-Id: 4048476-2. Query: 16 results. -> Book
Karte
(2,739 with label, 2,494 with GND-Id). GND-Id: 4029783-4. Query: 2,481 results because Karte
is still a medium
and no type. No resource with this natureOfContent
has no medium.Altkarte
. GND-Id: 4611904-8. 18 results. The same as in Karte
.Schulbuch
(2,443 with label, 2,455 with GND-Id). GND-Id: 4053458-3. Example: http://lobid.org/resources/HT014546652. Query: 65 results. -> http://purl.org/lobid/lv#Schoolbook but not as Book
as there are many CDs in addition to a book.Amtliche Publikation
. GND-Id: 4142300-8. Query: 301 results. -> http://purl.org/lobid/lv#OfficialPublicationErlebnisbericht
. GND-Id: 4133254-4. Query: 15 results. -> Report
Verzeichnis
. GND-Id: 4188171-0. Query: 11 results. -> ReferenceSource
.Other natureOfContent
which have at least 10 results but can't be given a definitive (existing or not yet existing) type:
Lehrmittel
. GND-Id: 4074111-4. Query: 72 results, mainly Audio or Video resourcesBibliographie
. GND-Id: 4006432-3. Query: 17 results. Also used for works which contain a bibliography.No fitting types but a few hundred resources:
Datenbank
. GND-Id: 4011119-2. Query: 482 results. No fitting type.Film
. GND-Id: 4017102-4. Query: 1287 results.Hörspiel
. GND-Id: 4025435-5. Query: 296 results. No fitting type.Website
. GND-Id: 4596172-4. Query: 1070 results.The regular expression for natureOfContent.id
should be adjusted so that also GND-Ids without "(DE-588)" are covered. For example, for "Amtliche Publikation" there are 1201 MAB fields without it but only 564 with the prefix.
For medium
(shouldn't be put into natureOfContent
but could be used):
CD-ROM
. GND-Id: 4139307-7. Only two are in natureOfContent
but there should be at least be 1,183, have to adjust indicator statement in regular expression.DVD-ROM
(354 with label, 353 with GND-Id). GND-Id: 4585131-1. No results without filtering for Miscellaneous
!Schallplatte
(189 results). GND-Id: 4052032-8. No results without filtering for Miscellaneous
!DVD-Video
. Not contained in natureOfContent
but in MAB 064: HT018857620 (MAB)Thanks for the overview. Re. bibliography, we actually have a class in lobid vocab: http://purl.org/lobid/lv#Bibliography
Should this be taken up in @acka47 in ALMA-Fix?
@dr0i could we make the natureOfContent.id and label subfield aggregatable? It would help to identify pattern and reuse them for facette matching?
Could be possible without any manipulation of natureOfContent.label
directly with some kind of metric aggregations in elasticsearch - didn't wokred that out, though. (I don't like in general to enable aggregations in fields, because of keeping the index as simple as possible. If this aggregations would be enabled in GUI I would do it, but not for doing a once-time analysis).
However, as we have natureOfContent.id
aggrgations enabled we just can use that data an make some lookups. This should be doable with metafacture - but this shell script also works:
curl -XGET 'http://weywot3.hbz-nrw.de:9200/resources/_search?q=natureOfContent.id:*&pretty=true' -d '
{
"size": 0,
"aggs": {
"aggs1": {
"terms": {
"field": "natureOfContent.id",
"size": 100
}
}
}
}
' >> aggsLabel.json
rm $idAndCount.tsv
jq -r '["key", "doc_count"], (.aggregations[].buckets[] | [.key, .doc_count])| @csv' aggsLabel.json | tr -d '"' > idAndCount.tsv
rm gndAndLabels.csv
for idCount in $(cat idAndCount.tsv); do
echo "idCount ist:$idCount"
id=$(echo "$idCount" | cut -d ',' -f1 )
count=$(echo "$idCount" | cut -d ',' -f2 )
echo "id ist: $id"
idRdfUrl=$id/about/lds ;
label=$(curl -L "$idRdfUrl" |grep gndo:preferredNameForTheSubjectHeading | cut -d '"' -f2) ;
echo "$count,$label,$id" >> gndAndLabels.csv;
done
cat gndAndLabels.csv|sort -g
Kind of complex. However, find the outcome of the script at https://gist.github.com/dr0i/17f284e439e3750596e0101c2505ac87.
I have the id as a list, but I need the labels since not all labels have ids. Therefore I would need an aggregation for natureOfContent.label. This would be temporary to extract a list and could be stoped then.
I see. Enabled aggregations on natureOfContent.label
, ran this:
curl ...._search?q=*&pretty=true" -d '
{
"size": 0,
"aggs": {
"aggs1": {
"terms": {
"field": "natureOfContent.label",
"size": 100
}
}
}
}'
got this:
...
"aggregations" : {
"aggs1" : {
"doc_count_error_upper_bound" : 905,
"sum_other_doc_count" : 206890,
"buckets" : [
{
"key" : "Zeitschrift",
"doc_count" : 1060256
},
{
"key" : "Aufsatzsammlung",
"doc_count" : 462305
},
{
"key" : "Konferenzschrift",
"doc_count" : 300850
},
{
"key" : "Monografische Reihe",
"doc_count" : 194971
},
{
"key" : "Hochschulschrift",
"doc_count" : 178580
},
{
"key" : "Bildband",
"doc_count" : 104299
},
{
"key" : "Ausstellungskatalog",
"doc_count" : 99843
},
{
"key" : "Quelle",
"doc_count" : 79261
},
{
"key" : "Biografie",
"doc_count" : 74869
},
{
"key" : "Statistik",
"doc_count" : 69858
},
{
"key" : "Periodicals.",
"doc_count" : 66722
},
{
"key" : "Fiktionale Darstellung",
"doc_count" : 65922
},
{
"key" : "Einführung",
"doc_count" : 65682
},
{
"key" : "Bibliografie",
"doc_count" : 63838
},
{
"key" : "Führer",
"doc_count" : 62780
},
{
"key" : "Ratgeber",
"doc_count" : 55196
},
{
"key" : "Lehrbuch",
"doc_count" : 54820
},
{
"key" : "Wörterbuch",
"doc_count" : 52832
},
{
"key" : "Karte",
"doc_count" : 46644
},
{
"key" : "Zeitung",
"doc_count" : 45407
},
{
"key" : "Anthologie",
"doc_count" : 42001
},
{
"key" : "Verzeichnis",
"doc_count" : 41236
},
{
"key" : "Online-Ressource",
"doc_count" : 38223
},
{
"key" : "Erlebnisbericht",
"doc_count" : 35441
},
{
"key" : "Lehrmittel",
"doc_count" : 31836
},
{
"key" : "CD-ROM",
"doc_count" : 26371
},
{
"key" : "Kommentar",
"doc_count" : 26320
},
{
"key" : "Aufgabensammlung",
"doc_count" : 22977
},
{
"key" : "Autobiografie",
"doc_count" : 21387
},
{
"key" : "CD",
"doc_count" : 19934
},
{
"key" : "Schulbuch",
"doc_count" : 19655
},
{
"key" : "Briefsammlung",
"doc_count" : 17321
},
{
"key" : "Amtliche Publikation",
"doc_count" : 17083
},
{
"key" : "Scores.",
"doc_count" : 16265
},
{
"key" : "Schulprogramm",
"doc_count" : 16221
},
{
"key" : "Beispielsammlung",
"doc_count" : 15067
},
{
"key" : "Bericht",
"doc_count" : 14342
},
{
"key" : "Festschrift",
"doc_count" : 13167
},
{
"key" : "Kinderbuch",
"doc_count" : 13156
},
{
"key" : "Katalog",
"doc_count" : 12912
},
{
"key" : "DVD-Video",
"doc_count" : 12872
},
{
"key" : "Anleitung",
"doc_count" : 12192
},
{
"key" : "Interview",
"doc_count" : 11879
},
{
"key" : "Bilderbuch",
"doc_count" : 10582
},
{
"key" : "Education films.",
"doc_count" : 10032
},
{
"key" : "Online-Publikation",
"doc_count" : 9895
},
{
"key" : "Adressbuch",
"doc_count" : 9002
},
{
"key" : "Reisebericht",
"doc_count" : 8938
},
{
"key" : "Jugendbuch",
"doc_count" : 8488
},
{
"key" : "Comic",
"doc_count" : 8278
},
{
"key" : "Regionalzeitung",
"doc_count" : 8169
},
{
"key" : "Kunstführer",
"doc_count" : 8122
},
{
"key" : "Kochbuch",
"doc_count" : 8020
},
{
"key" : "Datenbank",
"doc_count" : 7398
},
{
"key" : "History.",
"doc_count" : 7327
},
{
"key" : "Lokalpresse",
"doc_count" : 7127
},
{
"key" : "Atlas",
"doc_count" : 7054
},
{
"key" : "Fallstudiensammlung",
"doc_count" : 6994
},
{
"key" : "Werkverzeichnis",
"doc_count" : 6874
},
{
"key" : "Kindersachbuch",
"doc_count" : 6867
},
{
"key" : "Chamber music.",
"doc_count" : 6686
},
{
"key" : "Altkarte",
"doc_count" : 6561
},
{
"key" : "Documentary films.",
"doc_count" : 6223
},
{
"key" : "Sacred music.",
"doc_count" : 6000
},
{
"key" : "Tagebuch",
"doc_count" : 5740
},
{
"key" : "Übungssammlung",
"doc_count" : 5590
},
{
"key" : "Hörbuch",
"doc_count" : 5470
},
{
"key" : "Film",
"doc_count" : 5454
},
{
"key" : "Website",
"doc_count" : 5381
},
{
"key" : "Bibliographie",
"doc_count" : 5258
},
{
"key" : "Bestimmungsbuch",
"doc_count" : 5025
},
{
"key" : "Stadtplan",
"doc_count" : 4826
},
{
"key" : "Programmheft",
"doc_count" : 4814
},
{
"key" : "Forschungsbericht",
"doc_count" : 4691
},
{
"key" : "Criticism, interpretation, etc.",
"doc_count" : 4613
},
{
"key" : "Fallsammlung",
"doc_count" : 4558
},
{
"key" : "Unterrichtseinheit",
"doc_count" : 4260
},
{
"key" : "Mehrsprachiges Wörterbuch",
"doc_count" : 4103
},
{
"key" : "Datensammlung",
"doc_count" : 4049
},
{
"key" : "Richtlinie",
"doc_count" : 3990
},
{
"key" : "Mikroform",
"doc_count" : 3761
},
{
"key" : "Anzeigenblatt",
"doc_count" : 3742
},
{
"key" : "Theaterstück",
"doc_count" : 3734
},
{
"key" : "Formularsammlung",
"doc_count" : 3673
},
{
"key" : "Haushaltsplan",
"doc_count" : 3603
},
{
"key" : "Werkzeitschrift",
"doc_count" : 3524
},
{
"key" : "Kongress",
"doc_count" : 3281
},
{
"key" : "Loseblattsammlung",
"doc_count" : 3184
},
{
"key" : "Inventar",
"doc_count" : 3147
},
{
"key" : "Humoristische Darstellung",
"doc_count" : 2907
},
{
"key" : "Entscheidungssammlung",
"doc_count" : 2829
},
{
"key" : "Umfrage",
"doc_count" : 2780
},
{
"key" : "DVD-ROM",
"doc_count" : 2738
},
{
"key" : "Kalender",
"doc_count" : 2733
},
{
"key" : "Zeittafel",
"doc_count" : 2730
},
{
"key" : "Songs.",
"doc_count" : 2683
},
{
"key" : "Motets.",
"doc_count" : 2677
},
{
"key" : "Lehrerhandbuch",
"doc_count" : 2647
},
{
"key" : "Conference papers and proceedings.",
"doc_count" : 2603
},
{
"key" : "Feature films.",
"doc_count" : 2554
}
]
Thanks, i checked how many records have natureOfContent at all: "only" 3693643
Chenaged the index settings to take the string as a whole as key, so spaces are now part of it. Updated the result in the comment above.
This ticket is a duplicate for #1549 The relevant comment is quoted here. 689??.f
and 655 should be included type and medium here.
* [x] Adressbuch * [x] Altkarte * [ ] Amtliche Publikation * [ ] Anleitung * [ ] Anthologie * [ ] Antiquariatskatalog * [ ] Anzeigenblatt * [x] Atlas * [ ] Aufgabensammlung * [x] Aufsatzsammlung * [x] Auktionskatalog * [x] Ausstellungskatalog * [x] Autobiografie (alte Rechtschreibung: Autobiographie) [Hinweis kann Zeitangaben haben] * [ ] Autograf * [x] Backbuch * [ ] Beispielsammlung * [x] Bericht * [x] Bestimmungsbuch * [x] Bibliografie (alte Rechtschreibung: Bibliographie) [Hinweis kann Zeitangaben haben] * [x] Bild * [x] Bildband * [ ] Bilderbogen * [x] Bilderbuch * [ ] Bildnis * [ ] Bildwörterbuch * [ ] Biografie (alte Rechtschreibung: Biographie) [Hinweis kann Zeitangaben haben] * [x] Blindendruck * [ ] Briefsammlung [Hinweis kann Zeitangaben haben] * [ ] Checkliste * [ ] Comic * [ ] Datenbank * [ ] Datensammlung * [ ] Diagramm * [ ] Diskografie * [x] Drehbuch * [ ] Einblattdruck * [ ] Einführung * [ ] Entscheidungssammlung * [x] Enzyklopädie * [x] Erlebnisbericht * [ ] Fachkunde * [ ] Fahrplan * [ ] Faksimile * [ ] Fallsammlung * [ ] Fallstudiensammlung * [x] Festschrift * [ ] Fiktionale Darstellung * [ ] Film * [ ] Filmografie * [ ] Flugblatt * [ ] Flugschrift * [ ] Formelsammlung * [ ] Formularsammlung * [x] Forschungsbericht * [ ] Forschungsdaten * [ ] Fotografie (alte Rechtschreibung: Photographie) * [ ] Führer * [ ] Fundstellenverzeichnis * [ ] Genealogische Tafel * [ ] Gespräch * [ ] Globus * [ ] Grafik (alte Rechtschreibung: Graphik) * [ ] Graphzine * [ ] Handschrift * [ ] Haushaltsplan * [ ] Hochschulschrift * [x] Hörbuch * [ ] Hörspiel * [ ] Humoristische Darstellung * [ ] Inkunabel * [ ] Interview * [ ] Inventar * [ ] Jugendbuch * [ ] Jugendsachbuch * [ ] Kalender * [ ] Karikatur * [ ] Karte * [ ] Katalog [Hinweis kann Zeitangaben haben] * [ ] Kinderbuch * [ ] Kindersachbuch * [ ] Kochbuch * [ ] Kolumnensammlung * [ ] Kommentar * [ ] Konferenzschrift * [ ] Konkordanz * [ ] Kunstführer * [ ] Künstlerbuch * [ ] Laudatio * [ ] Lehrbuch * [ ] Lehrerhandbuch * [ ] Lehrmittel * [ ] Lehrplan * [ ] Lernsoftware * [ ] Lesebuch * [ ] Liederbuch * [x] Literaturbericht [Hinweis kann Zeitangaben haben] * [ ] Loseblattsammlung * [x] Mehrsprachiges Wörterbuch (alte Version: Wörterbuch ) * [ ] Mitgliederverzeichnis * [ ] Modell * [ ] Monografische Reihe * [ ] Musikhandschrift * [ ] Nachruf * [x] Norm * [ ] Ortsverzeichnis * [ ] Papyrus * [ ] Patentschrift * [ ] Plakat * [ ] Plan * [ ] Postkarte * [ ] Praktikum * [ ] Predigthilfe * [ ] Pressendruck * [ ] Pressestimme * [ ] Programmheft * [ ] Puzzle * [ ] Quelle * [ ] Ratgeber * [ ] Rede * [ ] Referateorgan * [ ] Regest * [x] Reisebericht [Hinweis kann Zeitangaben haben] * [ ] Reportagensammlung * [ ] Rezension * [x] Richtlinie * [ ] Röntgenbild * [x] Rückläufiges Wörterbuch * [ ] Sachbilderbuch * [ ] Satzung * [ ] Schematismus * [x] Schulbuch * [ ] Schulprogramm * [ ] Software * [ ] Spiel * [ ] Sprachatlas * [ ] Sprachführer * [x] Stadtplan * [ ] Statistik [Hinweis kann Zeitangaben haben] * [ ] Tabelle * [ ] Tafel * [ ] Tagebuch [Hinweis kann Zeitangaben haben] * [ ] Technische Zeichnung * [ ] Telefonbuch * [ ] Testmaterial * [ ] Theaterstück * [ ] Thesaurus * [ ] Übungssammlung * [ ] Umfrage * [ ] Unterrichtseinheit * [ ] Urkunde * [ ] Verkaufskatalog * [ ] Verzeichnis * [ ] Vorlesungsverzeichnis * [ ] Weblog * [x] Website * [ ] Weltkarte * [x] Werkverzeichnis [Hinweis kann Zeitangaben haben] * [ ] Werkzeitschrift * [ ] Wörterbuch * [ ] Zeichnung * [x] Zeitschrift * [ ] Zeittafel * [x] Zeitung * [ ] Zitatensammlung
RWSK RAK old Formschlagwörter that have no matching values in Formangaben RDA: These seem to be carrier types.
* [ ] Audiovisuelles Material * [ ] Audiovisuelles Material <für Kinder> * [ ] Bildplatte * [ ] CD * [ ] CD <für Kinder> * [ ] CD-ROM * [ ] CD-ROM <für Kinder> * [ ] Dia * [ ] Diskette * [ ] Diskette <für Kinder> * [ ] DVD-Audio * [ ] DVD-Audio <für Kinder> * [ ] DVD-ROM * [ ] DVD-ROM <für Kinder> * [ ] DVD-Video * [ ] DVD-Video <für Kinder> * [ ] Elektronische Publikation * [ ] Elektronische Publikation <für Kinder> * [ ] Film <für Kinder> * [ ] Film 8mm (Schreibvariante: Film 8 mm) * [ ] Film Super-8 * [ ] Film 16mm (Variante: Film 16 mm) * [ ] Film 35mm (Variante: Film 35 mm) * [ ] Film 65mm (Variante: Film 65 mm) * [ ] Film 70mm (Variante: Film 70 mm) * [x] Medienkombination * [x] Mikroform * [x] Musikdruck * [ ] Online-Publikation * [ ] Schallplatte * [ ] Schallplatte <für Kinder> * [ ] Text * [ ] Tonbildreihe * [ ] Tonkassette * [ ] Tonkassette <für Kinder> * [ ] Tonträger * [ ] Tonträger <für Kinder> * [ ] Videokassette * [ ] Videokassette <für Kinder>
Additional Formschlagwörter from: service-wiki.hbz-nrw.de/display/SEM/RSWK+Formschlagwoerter
* [ ] Arbeitstransparent * [ ] Ausstellung * [ ] Belletristische Darstellung * [ ] Bildliche Darstellung * [ ] Flugblatt * [x] Kongress * [ ] Lernprogramm * [ ] Neuerwerbungsliste [Hinweis kann Zeitangaben haben] * [ ] Programm * [x] Schriftenreihe * [ ] Telefaxverzeichnis * [ ] Telexverzeichnis
We should exclude carrier-types from natureOfContent (wiki.dnb.de/pages/viewpage.action?pageId=106039270).
This is to enable a homogenous way of filtering the data by type. (NWBib editors said they could provide us with a mapping.)
E.g. all resources with type "Miscellaneous" and natureOfContent.id "http://d-nb.info/gnd/4329497-2" (Hörbuch) can get the type "Book".