uwlib-cams / MARC2RDA

mapping between MARC21 and RDA-RDF
Creative Commons Zero v1.0 Universal
32 stars 2 forks source link

533 reproduction note #207

Open CECSpecialistI opened 2 years ago

CECSpecialistI commented 2 years ago

https://github.com/uwlib-cams/MARC2RDA/blob/main/Working%20Documents/5XX.csv

lake44me commented 11 months ago

Here's a link to the 2014 PCC policy statement, describing the practice of "cloning" a record for the original, with specific changes to fixed fields and other areas but leaving most of the description for the original and using a 533 to describe the reproduction. This started out being for microforms, but is being used for electronic reproductions as well. It doesn't describe a "single record approach" for original and reproduction, but may be used with a single record approach for multiple electronic manifestations. https://view.officeapps.live.com/op/view.aspx?src=https%3A%2F%2Fwww.loc.gov%2Faba%2Fpcc%2Fdocuments%2F1-11-LC-PCC-PS.docx&wdOrigin=BROWSELINK

Here's the MARC 533 field description:

Descriptive data for a reproduction of an original item when the main portion of the bibliographic record describes the original item and the data differ.

The original item is described in the main portion of the bibliographic record and data relevant to the reproduction are given as a note in field 533 when they differ from the information describing the original. It is used whenever an institution chooses to have the description reflect the original and the notes reflect information about the copy.

For mixed materials, this field contains information describing a copy of a record unit when the agency describing the materials possesses only a copy and, in accordance with conventions, the original is described in the main portion of the control record. This field is normally used in conjunction with field 535 (Location of Originals/Duplicates Note) which indicates the repository holding the original.

Other relevant documents:
PCC RDA Provider-Neutral Guidelines for Serials, Integrating Resources, and Monographs OCLC Special Cataloging Guidelines section 3.1 and 3.2, Provider-Neutral cataloging for Electronic Resources and for Photocopy/POD (Print On Demand) reproductions. DLF Registry of Digital Masters Record Creation Guidelines Version 2

lake44me commented 11 months ago

Challenge: 533 used for "single record" cataloging for photocopy / POD records following OCLC instruction would contain only 533 $a (type of reproduction). If a record follows the guidelines, a marker for when 533 is used in this context is 040 $e containing "pd" . This subfield $a may be mapped to the manifestation being described (as a category of manifestation?), but, should presence of such a field be used to modify the mapping for other fields in the record (to an IRI for the "original" manifestation) or should that be changed?. OCLC 3.2 specifies fields to describe the original; the key differing data here would be the place, producer, and date of production of the manifestation being described (e.g. POD reproduction) which is never specified (unless there is a local practice).
@CECSpecialistI

lake44me commented 11 months ago

Challenge: As far as I can tell, tag 533 doesn't provide data to distinguish reproductions which are published from those that are unpublished. RDA does not provide a "superelement" which could be used to map data when this is uncertain. There is Manifestation Production Statement for unpublished material and Manifestation Publication Statement for published material. There's a nonspecific Place of Manifestation, but when you get to the agent, you must choose Name of Producer for unpublished, or Name of publisher for published... there are separate properties for dates too. This question may have come up when mapping 260/264 as well - any advice @SitaKB ? @CECSpecialistI

lake44me commented 11 months ago

@CECSpecialistI This is going to take more than a week to complete, but here's where I've gotten so far:

First pass / provisional mapping of 533 subfields EXCEPT for subfields $y (Data provenance) and $7 .

I have sketched out some of the fields that need conditional instructions for mapping, when a 533 is present:

008:
If 533 is present, mint a URI for an "original manifestation", manifesting the same expression of the same work as the reproduction represented by the MARC record. Follow the mapping instructions for 008 except: (detail of which bytes should be mapped in relation to the original manifestation URI (detail of which bytes should be mapped in relation to the current (reproduction) manifestation URI) (detail of which bytes should be mapped in relation to both reproduction and original manifestation, if any)

245 if 533 is present, in addition to mapping contents of 245 subfields to mainfestation elements related to item cataloged (the reproduction), map to relation to a manifestation for the original (minted in 008 mapping).

260 and/or 264: if 533 is present, map manifestation properties in relation only to manifestation URI "for the original", as generated in the 008 mapping relation.

300: if 533 is present, map manifestation properties in relation only to manifestation URI "for the original", as generated in the 008 mapping relation

800, 810, 811, 830 if 533 is present, map manifestation properties only to manifestation URI "for the original" as generated in the 008 mapping relation.

I also need to figure out how to determine "digital masters" records that follow the DLF practice (how to determine if a record is for a digital master, and how mappings are affected) to see whether this presents a variant to what I'm proposing, and how this related to the location / 535 mapping.

lake44me commented 11 months ago

Challenge: 533 $m Dates and/or sequential designation of issues reproduced.

This implies relationship to an original that is an aggregating serial resource. The contents may indicate that all, or only a portion of that resource is reproduced in this manifestation, but which cannot be determined from the data.

The best I could come up with was Note on manifestation. If anyone involved with serials knows of any guidelines for describing reproductions that would give a more specific property/relation between the reproduction and the original, let me know! Crystal, I don't know how to "@" everybody - can you see that these comments are shared to relevant people in the group? @CECSpecialistI

CECSpecialistI commented 11 months ago

Thank you @lake44me ! I added a review of these comments to the agenda for tomorrow.

CECSpecialistI commented 10 months ago

From: Akerman, Laura [liblna@emory.edu](mailto:liblna@emory.edu) Sent: Thursday, December 7, 2023 10:45 AM To: Crystal Yragui [cec23@uw.edu](mailto:cec23@uw.edu) Subject: 533 mapping

Hi Crystal,

For some weird reason I woke up at 2:30 this morning with energy and started working on the 533 mapping. I think I’ve covered every subfield but $7 . I’d appreciate it if you could review the fields I’ve mapped.

$7 is a selective “fixed field” corresponding to 008 general values for bytes 6-17 Date type, Date 1 and Date 2, and Country of publication, plus for continuing resources, byte 18 Frequency. I thought I could just borrow the mappings from the 008 field and plug them in here.

But… I need to be sure I understand the logic for at least the position 06 (Dtst) and 7-10 and 11-14 mappings, particularly for the notes that are getting generated and the transform notes. Would you be able to explain this to me? I think I wasn’t at those early 008 mapping parties… If I could possibly get a little of your time (say ½ an hour) for a Zoom call on this, next week or week after, or even early next year, that’d be great. Or suggest alternatives (should I talk to Sita?)

Later on, I would make a special version of 008 for records that contain a 533 tag, with instructions to mint an IRI for original manifestation and relate it to the expression and manifestation we are cataloging, and map designated position values to the description of that original manifestation based on the documentation we have about the previous PCC 533 practice. I’d ask Theo if he’d prefer getting this version with just the positions that describe the original (and put a condition /= 533 or something on those positions in the full 008 mapping), or duplicate the whole thing… Anyway, I need to understand the 008 mapping better than I thought I’d have to – or at least the dates part.

That may not be necessary for other tags – I could add conditional lines for what to do when 533 is present and make reference to the manifestation IRI for the original minted for 008 mapping. That’s the plan, anyway.

Laura

P.S. I see where we recorded a Decision in March to just map 533 as a Note on Manifestation, but from the discussion at the November 8 meeting, we are revisiting it, maybe we wait to change the Decision until after the mapping is done?

CECSpecialistI commented 10 months ago

@CECSpecialistI plans to work on this next week.

CECSpecialistI commented 9 months ago

For $7, look for some 539's in OCLC/some examples in Alma

lake44me commented 9 months ago

533 reproduction note examples

Use of $5

Millions like this in OCLC, and thousands in our Alma catalog (Hathi Trust):

oclc 615373298 533 Electronic reproduction. ‡b [Place of publication not identified] : ‡c HathiTrust Digital Library, ‡d 2010. ‡5 MiAaHDL

Use of $7

Probably hundreds of thousands like this, Eighteenth Century Collections Online eletronic set, contains $7

Emory Alma MMSID 990004332780302486 533 __ |a Electronic reproduction. |b Farmington Hills, Mich. : |c Thomson Gale, |d 2003. |n Available via the World Wide Web. |n Access limited by licensing agreements |7 s2003 miunns

Also in our Alma: oclc 52967803 533 __ |a Electronic reproduction. |b Atlanta, Georgia : |c Pitts Theology Library, Emory University, |d 2003. |f (Thanksgiving Day Sermons, ATLA Cooperative Digital Resources Initiative, CDRI). |n Joint CDRI project by: Andover-Harvard Library (Harvard Divinity School), Pitts Theology Library (Emory University), and Princeton Theological Seminary Libraries. |7 s2003 gaun s

An example (of perhaps many) with 2 533 tags - Hathi Trust and Harvard Library identified in $5's

OCLC 674405981

506 |3 Use copy |f Restrictions unspecified |2 star |5 MiAaHDL 533 |a Electronic reproduction. |b [Place of publication not identified] : |c HathiTrust Digital Library, |d 2010. |5 MiAaHDL 538 |a Master and use copy. Digital master created according to Benchmark for Faithful Digital Reproductions of Monographs and Serials, Version 1. Digital Library Federation, December 2002. |u http://purl.oclc.org/DLF/benchrepro0212 |5 MiAaHDL 583 1 |a digitized |c 2010 |h HathiTrust Digital Library |l committed to preserve |2 pda |5 MiAaHDL 588 0 |a Print version record. 506 |a No restrictions on access copy. |5 MH 533 |a Electronic reproduction. |b Cambridge, Mass. : |c Harvard College Library Digital Imaging Group, |d 2008. |f (Harvard College Library preservation digitization program). |5 MH 538 |a Digital master created according to Benchmark for Faithful Digital Reproductions of Monographs and Serials, Version 1. Digital Library Federation, December 2002. |u http://purl.oclc.org/DLF/benchrepro0212 |5 MH 583 1_ |j Harvard University Library |l committed to preserve |5 MH

I can look for more, but I think from what I've seen so far, the use of $5 indicates the institution that created the reproduction, not necessarily the one that owns a copy that was reproduced. That's at least the case for the Hathi Trust 533's.

Do we need to see more $7's? Or should we just attempt to map them based on the 008 bytes they represent?

lake44me commented 9 months ago

for $5, would this do?

Creator corporate body of manifestation http://rdaregistry.info/Elements/m/P30421

either Identifier (the symbol), or a nomen that has nomen string of the symbol along with scheme of nomen https://www.loc.gov/marc/organizations/orgshome.html

or IRI (the id.loc.gov Cultural Heritage Institutions IRI corresponding to the symbol, if we take this to be a "real world object" IRI, but note some symbols aren't represented in this list, e.g. MiAaHDL for Hathi Trust Digital Library)

CECSpecialistI commented 9 months ago

@cspayne @gerontakos @CECSpecialistI this needs your attention RE: how to process 008. You're getting an email from @lake44me

lake44me commented 8 months ago

@cspayne @gerontakos @CECSpecialistI Apologies for being slow to send that email - I realized there was a duck I didn't have in a row before I present choices to you: Provider-neutral cataloging. I'm examining OCLC's specifications here: https://www.oclc.org/bibformats/en/specialcataloging.html#providerneutralcataloging which hopefully follow PCC specifications and I think I'm finally getting some clarity:

PN cataloging could be for digitized reproductions of physical materials or for born-digital content. PN records are expected to have an 040 $e with code "pn".

The PCC instructions say "All online resources are considered published. The provider-neutral model (in contrast to RDA) specifies that if the e-resource being cataloged is an online reproduction of a tangible M (Mandatory); A (Mandatory if applicable); O (Optional); X (not used); N/A (not applicable) RDA Provider-Neutral Guidelines for Serials, Integrating Resources, and Monographs resource, usually the Production, Publication, Distribution, Manufacture and Copyright notice information will come from the original tangible source record. " So, elements like 008 Dates and 264 publication information describe an original.

But the kicker is this PCC instruction : 533: Use only for records for DLF Registry of Digital Masters, HathiTrust Digital Library, and other digital preservation projects. Use with $5

This opens the possibility that there are PN records lacking a 533 which are still reproductions of an original, just not related to a preservation master, and which mix descriptive elements of both original and reproduction in the record! I don't know what to do about those if they exist at this point - that's going to be an area for future analysis, perhaps in Phase 2. GRRRR!

But it appears that at lest we can say that PN electronic manifestation records with a 533, like other records with a 533, are using the mixed original/ reproduction description approach, so I think we have one "sort" for processing them.

The question I'll be asking if we have a meeting, is, for ease of transformation, should we have separate mapping tables for records with 533 than from records without? Rather than applying a 533 condition to every field/subfield/byte that needs one...

We could label the non-533 tables as "no 533" for tags that are affected.

Or, would you prefer getting the conditions for 533 in each line of the existing tables?

Or, some combination (it would be really rough to do this line by line for 008, but perhaps easier for 264 or other fields).

Tags potentially affected (I think, but will double-check): 008, 245, 260, 264, 300, 8xx).

@AdamSchiff please weigh in if I've missed anything important about provider-neutral cataloging.

CECSpecialistI commented 8 months ago

@lake44me, would you be up for putting together a clear proposal on this for discussion during a meeting? I confess that I have not been keeping up with the complexities of reproductions and the 533, and have cataloged very few of them. I am not sure how feasible separate mapping tables will be for reproductions vs. originals but have an open mind, although it might need to wait for Phase II of the project (read: 2025 or beyond) if I'm being practical, unless I'm misunderstanding the complexity of what you're proposing.

lake44me commented 8 months ago

@CECSpecialistI I was not up for doing it this week.

I think if I'm going to try to write up the complexity, I might as well take a crack at just adding the conditional lines to the 008 and other tags. The 008 is the most complex. Just to be safe, I will duplicate these mappings and if the conditions are approved, will just substitute them in - and I'll be as careful as I can be. Will aim for end of month.

lake44me commented 8 months ago

Discovered a new wrinkle for use of 533 tags. Was looking at the OCLC page and it had an example in the definition for $a:

Record for the print version that notes the existence of a digital preservation version:

533 ǂ3 Volumes 78-81 (1996-1999): ǂa Also available as electronic reproduction. ǂb [Chicago] : ǂc University of Chicago Library, ǂd [2006] ǂ5 ICU 533 Also available as electronic reproduction. ǂb Washington, D.C. : ǂc Library of Congress, ǂd [2002-2003] ǂ5 DLC

I went looking in the Library of Congress catalog (advanced search on the phrase "also available as" in Notes). Most of the ones I sampled toward the front used that phrase in a 530 Additional Physical Form Available note (usually "Also available as an ebook"), but looking at the end of the 5,000+ results, I found a record, LCCN 2009374281 https://lccn.loc.gov/2009374281/marcxml . As best I can determine, the record describes a "tangible" resource, but with addition of these tags:

533 $a Also available as electronic reproduction. $b Chicago : $c Center for Research Libraries, $d[2007] 538 $a Master and use digital copies. Technical details on the digital scanning are available at http://dds.crl.edu/technical_information.asp?tid=4464 583 $aDigitized $c 2007 $h Center for Research Libraries $e pda $5 CRL

There are undoubtedly more of these. I am thinking this is not impossible to deal with, but introduces a conditional wrinkle - if 533 $a begins with "Also available as", treat the record as describing an original, and deal with the 533 and 538 as describing a different, reproduction manifestation. Will catch up with details of that later.

As Charlie Brown would say, "Aaaaaaaugh!"

lake44me commented 7 months ago

@CECSpecialistI @cspayne I haven't made it all the way through 008 yet (stopping after line 34) but I wanted you to get an idea of the pattern before I invest a lot of time in it. https://docs.google.com/spreadsheets/d/14b6_gyuwoVf9_d-9j-yu85dEQwvzdblD87SLoibELXE/edit#gid=1517787708

Some questions:

  1. Would it be better for the coding work to have a mapping document for "Reproduction Conditions 533 and other" and just throw all the conditional mappings with subject "Original Manifestation" IRI (or in some cases, both Original Manifestation and Reproduction Manifestation) for 008, 245, 260 etc. into it? And then change the label for 008, 245, 260 etc. to say, 008 No Repro Cond., etc. ?

I'm thinking it would make it easier to "sort" the record for Reproduction Conditions at the outset and then apply the different transform templates, and anything not meeting those conditions would be handled as normal based on those spreadsheets

Or, should I continue inserting these conditional fields in each mapping table wherever they might apply?

  1. As I examine the mappings of 008 general positions and specific format positions, I can see that some are likely to apply to both original and reproduction. Certainly if these are expression properties (such as language 008 35-37) that will be automatic. But if they are manifestation properties it would be a judgement call whether they should be the same between reproduction and original (I would think the 008 18-21 Illustrations for BOOKs format should be the same).
    Mapping these to both original and reproduction would result in a fuller description of the original, but since this is not what we're primarily creating a record for, I'm thinking we could just apply them to the reproduction, unless we think there would be a value down the line in having a fuller set of data about the "original" for reconciliation with data created to describe the original somewhere (maybe in the same collection, maybe not). Your thoughts?

  2. I put a note in there about provider-neutral records (assuming we can identify them all with 040 $e pn). The PCC document seems to encourage this approach for records for online resources in general. There are a great number of PN records from digital preservation projects out there with 533's, but there are probably also quite a number that leave out the 533 as suggested in the PCC guidance. The important part of those instructions is that for reproductions, continue to depart from RDA and still code the 008 dates, 264 etc. for the original rather than the repro. Without a 533, I don't know if there is a clear indicator that something is an online reproduction rather than "born digital" material. I need to study this more.
    Cypress would you rather I wait until this aspect's conditions are provided (if we can get them)? Or can I go ahead and give you the 533 conditions, and then update them with PN conditions, and maybe others, later?

cspayne commented 7 months ago

@lake44me I'm not completely caught up on the mapping for 008 or 533 yet, but I can answer the coding questions.

I think having a separate mapping document would work great, and it should be fine to go ahead with the 533 conditions and update them as needed. If updates are made after coding has started, we'll need a way to indicate what's changed or been added so I don't miss anything.

lake44me commented 7 months ago

Thanks @cspayne! That answers my big question (#1) and lets me know #3 (provider neutral) is not a holdup.

CECSpecialistI commented 7 months ago

Hi @lake44me

I am not clear on the 533/008 issues either, and would appreciate an opportunity to be caught up. Do you think it would make sense for us to meet and talk about this before you go full steam ahead and invest more time in it, as you say? There are many comments spanning a good amount of time, and I'm not sure which ones still apply and which ones have been moved on from. Are you able to put together a succinct summary of what the issues are and what you're planning to do with the 008 and 533? Or, would you like me to try and put something on the calendar for you, me, and @cspayne in the next couple of weeks?

lake44me commented 5 months ago

Proposal for mapping 533 $7 (fixed field information for the reproduction) - with wrinkles:

Tag 533 $7. $7 is a "mini-fixed-field" for the reproduction manifestation being described by the MARC record. My approach is to identify 008 mappings to apply to the positions in $7 and just say something like:

Mapping to Reproduction manifestation:

For byte 0, apply 008 mappings for byte 6 (Type of date/Publication status) For bye 1-4, apply 008 mappings for bytes 7-10 (Date1) For bytes 4-8, apply 008 mappings for butes 11-14 (Date2)" For bytes 9-12, apply 008 mappings for bytes 15-17 (Place of Publication, Production or Execution) For byte 13, for Continuing Resources format only, apply 008 mapping for byte 18 (Frequency) For byte 14, for Continuing Resources format only, apply 008 mapping for byte 19 (Regularity) For byte 15, if format is Maps or Visual Materials, apply 008 mapping for byte 29 (Form of Item). For all other formats, apply 008 mapping for byte 23 (Form of item)

*conditions for determining format found elsewhere...?

Wrinkle:
Looking at the definitions in https://www.loc.gov/marc/bibliographic/bd533.html, we may need to break out the Date1 and Date2 mappings for Continuing Resources since the meaning is different. For "non-serials" the meaning seems straightforward and applying the 008 mappings relatively uncomplicated - the dates are of publication, etc. of the reproduction.

For serially-issued items, 1-4 contains the original beginning date of publication of the issues that have been reproduced, as indicated in subfield $m of field 533. For serially-issued items, 5-8 contains the original ending date of publication of the issues that have been reproduced, as indicated in subfield $m of field 533.

This reproduction-specific information probably will probably need to be a note field - or, if there are dates in subfield $m of the 533, just ignore these bytes. The reproduction manifestation may or may not be a continuing resource... ?

@SitaKB @CECSpecialistI @AdamSchiff @cspayne @GordonDunsire @JianPLee

lake44me commented 5 months ago

Wrinkle #2 - Form of item (byte 15)

LC-PCC PS for 1.11 RDA (Original RDA toolkit) Microform reproduction of print resources

    008/23: Record the value associated with the microform being cataloged (a, b, or c). 
    This in contrast to other enumerated 008 fields which should be coded for the original (print resource)

Print on Demand (POD) Reproductions and Photocopies
    008/23 (Form of item) or 008/29 (Form of item): Record the value "r" indicating the form of item is a print reproduction.
    Again, the value in the 008 pertains to the reproduction, not the original

PCC Provider-Neutral E-resources

008 23 / 29 Form of item - says "use code 'o' ; all other bytes of the 008 should reflect the original manifestation.  (o = online)

If the form of item in the 008 pertains to the reproduction, how and why could the byte 15 value in 533 $7 be different?

lake44me commented 4 months ago

@CECSpecialistI @AdamSchiff

Added this note to the mapping for 533:

Problem to be addressed when we tackle serial conversion (Phase 2)?
Definition for Date1 and Date2 in subfield 7 (fixed field information for the reproduction) for continuing resources are defined as original beginning and ending dates of issue for the reproduced serial volume, not date(s) of issue for the reproduction.

I need to understand more about how this fits in with the cataloging of reproductions of serials/CRs.

Do the reproductions get the same record format (Continuing Resources) as the original?

Other than this question, I wrote out the $7 mapping and think I am done with the 533 mapping, but we can wait on approving it until the other fields mappings are done.

lake44me commented 1 month ago

@CECSpecialistI @cspayne @dchen077 Crystal, can you review the 533 $7 mapping (the fixed field coding for the reproduction) - otherwise 533 is ready for code. Does it need to wait until the 008 coding is stabilized?

All the logic in that field is in the transformation notes:

"/0 - Type of date/Publication status Use 008 06 mappings Note: code r (original and reproduction dates) should not be used here!

/1-4 - Date 1 Use 008 07-10 mappings

/5-8 - Date 2 Use 008 11-14 mappings

/9-11 - Place of publication, production, or execution Use 008 15-17 mappings

/12 - Frequency Format must be CONTINUING RESOURCES.
Use 008 18 mappings

/13 - Regularity Format must be CONTINUING RESOURCES Use 008 19 mapping

/14 - Form of item If Format is BOOKS, COMPUTER FILES, MUSIC, CONTINUING RESOURCES, or MIXED MATERIALS Use 008 23 mapping for the respective format

If Format is MAPS or VISUAL MATERIALS Use 008 29 mapping"

CECSpecialistI commented 1 month ago

I don't have the capacity to be the reviewer on this in the next couple of weeks. If someone else can pick up the mapping review, I would be grateful. Since it relies on 008 values it might make sense to hold off on coding until the 008 mapping is finished?

cspayne commented 1 month ago

@lake44me I just marked $7 as reviewed, everything looks good to me. I added the link you provided me in a previous email to OCLC (https://www.oclc.org/bibformats/en/fixedfield/type.html) to help determine the format, which will be necessary for the transform.

lake44me commented 1 month ago

Thanks @cspayne ! I moved 533 to Ready for Transform

cspayne commented 2 weeks ago

code on hold for $7/01-04 and $7/05-08 until 008 - #50 is updated

cspayne commented 4 days ago

Hi @lake44me, I just updated 245 based on reproduction conditions and ran 5 records with 533 or 588 reproduction conditions through the transform to show what the output currently looks like. Not all fields that have reproduction conditions have been mapped at this point, but 006, 008, 533, and 264 should be set. The 5 records are here, and output is in this folder if you would like to take a look!