Closed acka47 closed 7 years ago
Here is a list of RDA record in lobid courtesy of @donboern: rda-ids.txt
Most of the reosurces listed in rda-ids.txt seem to be periodicals. Here is a print book: http://lobid.org/resource/HT018779822
What seems important to me for the start is field 419 with the pbulisher, publication place/date information.
Snippet from http://lobid.org/resource/HT018779822:
<datafield ind2="1" ind1="-" tag="419">
<subfield code="a">New York</subfield>
<subfield code="b">Routledge</subfield>
<subfield code="c">2014</subfield>
</datafield>
Snippet from http://lobid.org/resource/HT018772912:
<datafield ind2="1" ind1="-" tag="419">
<subfield code="a">Sundern</subfield>
<subfield code="b">Baulmann Leuchten GmbH</subfield>
<subfield code="c">2011-</subfield>
<subfield code="A">3</subfield>
</datafield>
Thus, we will have to add the RDA transformation rules only for these records.
Does this mean that fields are ambiguous (i.e. e.g. 419-1c
is the publication date if it's RDA catalogued but something different when it's old MAB2? (In this case I can see that that's not the case)).
If there is no interference it's much simpler to configure the transformation rules, that's why I ask.
@droi Could you please get the Aleph XML source of all files in rda-ids.txt and put them in one file so that I can search for specific fields?
for i in $(cat rda-ids.txt); do xmllint --format "$i?format=source" >> rda-ids.alephMabXmlPretty.xml; done
You find that at http://lobid.org/download/rda-ids.alephMabXmlPretty.xml .
Speaking to publisso stakeholders, they want to work with roles of persons/corporations from RDA. We will have to consider these in the transformation. Note to self: Take a look at this and open a separate issue.
Updated rda-ids.alephMabXmlPretty.xml . Took as base DE-605-aleph-baseline-marcxchange-2016011515.tar.gz which reveals 16k resources as RDA. Hope this suffices.
@dr0i Could you please update rda-ids.alephMabXmlPretty.xml once more?
Around 180k docs, concatenated in one big bzipped xml file: http://lobid.org/download/rda-ids.alephMabXmlPretty.xml.bz2
Thanks. Unwieldy as the file gets, I won't ask again for creating it. Now thinking about how to work with a 1,5GB xml file...
Depending on what you want, you can always use the friendly stream tools like less
, grep
, sed
etc.
There seems to be a problem with the rda-ids.alephMabXmlPretty.xml. When I do for example cat rda-ids.alephMabXmlPretty.xml | xmllint --format - | grep --color -A 4 "<datafield tag=\"064\" ind1=\".\" ind2=\".\">"
I get:
-:103: parser error : XML declaration allowed only at the start of the document
<?xml version="1.0" encoding="UTF-8"?>
^
-:104: parser error : Extra content at the end of the document
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.o
As a finger execise I looked at the morph-hbz01-to-lobid.xml to check fields which are now omitted, how they are transformed and to document it here.
Number MAB2 | Field name MAB2 | if & how transformed to RDF |
---|---|---|
300 | Sammlungsvermerk | - |
304 | Einheitssachtitel | a → dc/terms/alternative |
310 | Hauptsachtitel in Ansetzungsform | → Titel |
333 | zu ergänzende Urheber zum Hauptsachtitel | If no title exists, set as title. Also taken as CorporateBodyTitle. |
334 | Allgemeine Materialbenennung | Match with Bibo/AudioDocument, bibo/AudioVisualDocument, bibo/Image, RDACarrierType/1020 (Microform Carriers). Used for checking if full text is online. |
340, 344, 348, 352 | Parallelsachtitel in Ansetzungsform | - |
342, 346, 350, 354 | zu ergänzende Urheber zum Parallelsachtitel | - |
361 | Beigefügte Werke | - |
410, 411, 412, 415, 416, 417, 418 | Alter Erscheinungsvermerk | - |
454, 464, 474, 484, 494 | Gesamttitel in Ansetzungsform – wird auf Verbundebene entschieden! | - |
502 | Einheitssachtitel eines beigefügten oder kommentierten Werkes | - |
504 | Angabe von Paralleltiteln | → dc/terms/alternative |
517 | Angaben zum Inhalt | - |
519 | Alter Hochschulschriftenvermerk | If existing, multiple values are combined as RDA Elements/u/P60489 |
532 | Hinweise auf frühere und spätere sowie zeitweise gültige Titel | - |
610 – 645 | Segment Sekundärformen | 619a (Erscheinungsjahr(e) in Vorlageform) matched with 021 (Identifikationsnummer der Primaerform) |
652 | Spezifische Materialbenennung und Dateityp | a (stands for RAK-NBM) → Online ressource |
653 | Physische Beschreibung der Computerdatei auf Datenträger | - |
8XX | Segment Nichtstandardmäßige Nebeneintragungen | Matches with some GND-id? |
9XX | Bei RSWK-Schlagwörtern erstes Unterfeld $f | Matches with some GND-id? |
Closing this super-issue as the two remaining sub-issues are sufficient for future orientation (and don't need to be implemented for the launch).
From 1 October 2015 people will be cataloging in hbz union catalog according to the RDA rules as documented here. We will have to adjust the transformation, i.e. the hbz01-to-lobid morph file accordingly.
After a first cursory look at the documents, I suggest the following approach:
Identifying RDA records RDA is only implemented to newly catalogued resources which get an RDA marker
r
in field 030, indicator=blank,position 4 of the Aleph sequentials (aseq), see the documentation. Thus, we will have to add the RDA transformation rules only for these records.Checking fields that will be omitted Several fields won't be used anymore with RDA cataloging. You can see the list here. We will check whether and how we currently transform these to RDF.
Find out how to transform the new data to lobid'scurrent RDF data model After having identified the data fields where RDA means change we will have to find find out how we integrate the new RDA data into the the current lobid RDF.
Discuss how to handle breaks in the cataloging practice While we be able to make a seemless transformation for some of RDA cataloging so that lobid customers won't even notice that things have changed, this may not be possible for all of the changes. E.g., regarding IMD (Inhaltstyp, Medientyp, Datenträgertype)/CMC (content type, media type, carrier type) we will get better and more coherent information (see here for details).
On cases where cataloging practice significantly breaks, we will have to look, whether we will both try to map the data to the old/currrent data model and map the data according to RDA.