lcnetdev / marc2bibframe2

Convert MARC records to BIBFRAME2 RDF
http://www.loc.gov/bibframe/
Creative Commons Zero v1.0 Universal
89 stars 35 forks source link

Coupling 490s and 830s in series conversion #71

Closed osma closed 10 months ago

osma commented 6 years ago

I'm wondering about the way the converter seems to couple together information from 490 and 830 series statement fields. It seems to me that it walks through both kinds of fields, pairing the first 490 with the first 830, second 490 with the second 830 etc. Is this a safe thing to do?

Consider this LC record, which was mentioned in #27 as an example: http://id.loc.gov/tools/bibframe/compare-lccn/full-rdf?find=79640364

It has two 490s and two 830s. The series appears to have changed its title from DHEW publications to DHHS publications in 1977. The 490s are, in order, DHEW and DHHS. The order of the 830s is the opposite: first DHHS, then DHEW.

The result in the BIBFRAME output looks like this:

  <bf:hasSeries>
    <bf:Instance>
      <rdfs:label>DHEW publication</rdfs:label>
      <bf:seriesStatement>DHEW publication</bf:seriesStatement>
      <bf:seriesEnumeration>no. (OHDS)</bf:seriesEnumeration>
      <bflc:appliesTo>
        <bflc:AppliesTo>
          <rdfs:label>1976</rdfs:label>
        </bflc:AppliesTo>
      </bflc:appliesTo>
      <bf:instanceOf>
        <bf:Work rdf:about="http://bibframe.example.org/11225612#Work830-49">
          <rdfs:label>DHHS publication ;</rdfs:label>
          <bf:title>
            <bf:Title>
              <bflc:title30MatchKey>DHHS publication ;</bflc:title30MatchKey>
              <bflc:title30MarcKey>830 0$aDHHS publication ;$vno. (OHDS)</bflc:title30MarcKey>
              <rdfs:label>DHHS publication ;</rdfs:label>
              <bflc:titleSortKey>DHHS publication ;</bflc:titleSortKey>
              <bf:mainTitle>DHHS publication</bf:mainTitle>
            </bf:Title>
          </bf:title>
        </bf:Work>
      </bf:instanceOf>
    </bf:Instance>
  </bf:hasSeries>
  <bf:hasSeries>
    <bf:Instance>
      <rdfs:label>DHHS publication</rdfs:label>
      <bf:seriesStatement>DHHS publication</bf:seriesStatement>
      <bf:seriesEnumeration>no. (OHDS)</bf:seriesEnumeration>
      <bflc:appliesTo>
        <bflc:AppliesTo>
          <rdfs:label>1977-</rdfs:label>
        </bflc:AppliesTo>
      </bflc:appliesTo>
      <bf:instanceOf>
        <bf:Work rdf:about="http://bibframe.example.org/11225612#Work830-50">
          <rdfs:label>DHEW publication ;</rdfs:label>
          <bf:title>
            <bf:Title>
              <bflc:title30MatchKey>DHEW publication ;</bflc:title30MatchKey>
              <bflc:title30MarcKey>830 0$aDHEW publication ;$vno. (OHDS)</bflc:title30MarcKey>
              <rdfs:label>DHEW publication ;</rdfs:label>
              <bflc:titleSortKey>DHEW publication ;</bflc:titleSortKey>
              <bf:mainTitle>DHEW publication</bf:mainTitle>
            </bf:Title>
          </bf:title>
        </bf:Work>
      </bf:instanceOf>
    </bf:Instance>
  </bf:hasSeries>

So the first bf:series statement has an Instance "DHEW publication" with a work "DHHS publication". The second one has the opposite order.

This particular example doesn't have any ISSNs, but I've noticed that when ISSNs are involved the result is even more mixed up. I can prepare a more detailed example if necessary, but first I'd like to understand whether the above problem is a real one and if it's possible to do anything about it. In the given example the $a subfields in 490 and 830 happen to match exactly, so it would be possible for the converter to match every 490 with the corresponding 830 based on $a values, but that may not always be the case.

jodiw01 commented 10 months ago

The conversion of series fields was changed to model them as Hubs. The 490 and 8XX fields are not paired, but the Hub created from the 490 field has a bf:status="transcribed" to differentiate it.

Please see https://id.loc.gov/tools/bibframe/compare-id/full-rdf?find=11225612, the updated conversion of the example above.