lcnetdev / marc2bibframe2

Convert MARC records to BIBFRAME2 RDF
http://www.loc.gov/bibframe/
Creative Commons Zero v1.0 Universal
88 stars 35 forks source link

bf:cartographicAttributes and bf:Cartographic issues #233

Open kirkhess opened 1 year ago

kirkhess commented 1 year ago

Because of the duplication of cartographic information in MARC, in BIBFRAME examples you can clearly see the duplication and it causes problems converting back to MARC since the parts are all spilt up and indistinguishable so it is hard to cram back into the right 034 or 255 subfield without a lot of regexes, handwaving or guessing.

Example Work

This work has 3 cartographic attributes and 2 scales because of the conversion from the 034 and the 255.

        <bf:cartographicAttributes>
            <bf:Cartographic>
                <bflc:relief>
                    <bflc:Relief rdf:about="http://id.loc.gov/vocabulary/mrelief/spot">
                        <rdfs:label>spot heights</rdfs:label>
                    </bflc:Relief>
                </bflc:relief>
            </bf:Cartographic>
        </bf:cartographicAttributes>
        <bf:cartographicAttributes>
            <bf:Cartographic>
                <bf:coordinates>E0152500 E0285000 N0562000 N0474000</bf:coordinates>
            </bf:Cartographic>
        </bf:cartographicAttributes>
        <bf:scale>
            <bf:Scale>
                <rdf:value>600000</rdf:value>
                <rdfs:label>linear horizontal</rdfs:label>
            </bf:Scale>
        </bf:scale>

        <bf:scale>
            <bf:Scale>
                <rdfs:label>Scale 1:600,000. 1 cm = 6 km</rdfs:label>
            </bf:Scale>
        </bf:scale>
        <bf:cartographicAttributes>
            <bf:Cartographic>
                <bf:coordinates>E 15°25ʹ--E 28°50ʹ/N 56°20ʹ--N 47°40ʹ</bf:coordinates>
            </bf:Cartographic>
        </bf:cartographicAttributes>

This converts to these 034 and 255 fields, note the 4 255 fields:

034 1  $a a $b 600000
255    $a linear horizontal
255    $a Scale 1:600,000. 1 cm = 6 km
255    $c E0152500 E0285000 N0562000 N0474000
255    $c E 15°25ʹ--E 28°50ʹ/N 56°20ʹ--N 47°40ʹ

It is supposed to look like this Marc Record:

034 1  $a a $b 600000 $d E0152500 $e E0285000 $f N0562000 $g N0474000
255    $a Scale 1:600,000. 1 cm = 6 km $c (E 15°25ʹ--E 28°50ʹ/N 56°20ʹ--N 47°40ʹ).

You can't easily distinguish between the coded coordinates from the 034$defg and the coordinate statement from the 255$c. I think this could be remediated with a subtype of bf:Cartographic - :CodedCartographic & :MathematicalCartographic and a new property _:coordinateStatement. For a more precise conversion, you need something like westCoordinate, eastCoordinate, northCoordinate and southCoordinate. The 034 form is a specific type of coordinate so maybe you could use a datatype? I think for now you could tweak bibframe2marc to regex it into the right subfields.

The scales probably should be in cartographicAttributes instead of linked off the Work so this probably needs some analysis for the 034$a and 255$a. The label "linear horizontal" generates a 255$a when you convert to MARC so it needs to be fixed somehow. It is more of a subproperty - it indicates which subfield in MARC this should go in, so maybe it more sense to make either subtypes of bf:Scale for the MARC subfields like bf:Note or just make subproperties of bf:scale and change it so they are datatype properties instead of object properties. Similar to coordinates, the 255$a scale is a statement so I suggest adding _:scaleStatement.

Relief is a 008/18-21 code - I suggest using a bf:code and it seems like something which is CodedCartographic reducing one of the cartographicAttributes. Another possibility is a third type of bf:Cartographic since it could repeat more than once.

So the end result is something like this:

<bf:cartographicAttributes>
    <_:CodedCartographic>
        <bflc:relief>
            <bflc:Relief rdf:about="http://id.loc.gov/vocabulary/mrelief/spot">
                <bf:code>g</bf:code>
                <rdfs:label>spot heights</rdfs:label>
            </bflc:Relief>
        </bflc:relief>
        <bf:coordinates>E0152500 E0285000 N0562000 N0474000</bf:coordinates>
        <bf:scale>
            <bf:Scale>
                <rdf:type rdf:resource="_:LinearHorizontalScale/>
                <rdf:value>600000</rdf:value>
            </bf:Scale>
        </bf:scale>
    </_:CodedCartographic>
 </bf:cartographicAttributes>
<bf:cartographicAttributes>
    <_:MathematicalCartographic>
        <_:scaleStatement>Scale 1:600,000. 1 cm = 6 km</_:scaleStatement>
        <_:coordinatesStatement>E 15°25ʹ--E 28°50ʹ/N 56°20ʹ--N 47°40ʹ</_:coordinatesStatement>
    </_:MathematicalCartographic>
</bf:cartographicAttributes>

For blank node reduction and simplicity, I think we could describe this without bf:cartographicAttributes and Cartographic resources so those would be deprecated. I don't remember why we chose that pattern - it seems like based on current modeling we don't need them anymore?

<bf:Work>
    <rdf:type rdf:resource="http://id.loc.gov/ontologies/bibframe/Cartography" />
    <bflc:relief>
        <bflc:Relief rdf:about="http://id.loc.gov/vocabulary/mrelief/spot">
            <bf:code>g</bf:code>
            <rdfs:label>spot heights</rdfs:label>
        </bflc:Relief>
    </bflc:relief>
    <bf:coordinates>E0152500 E0285000 N0562000 N0474000</bf:coordinates>
    <bf:scale>
        <bf:Scale rdf:about="_:LinearHorizontalScale">
            <rdf:value>600000</rdf:value>
        </bf:Scale>
    </bf:scale>
    <_:scaleStatement>Scale 1:600,000. 1 cm = 6 km</_:scaleStatement>
    <_:coordinatesStatement>E 15°25ʹ--E 28°50ʹ/N 56°20ʹ--N 47°40ʹ</_:coordinatesStatement>
</bf:Work>
kefo commented 8 months ago

This seems, at least for now, more a conversion issue than an ontology issue, hence the move.

Your "blank node reduction" idea is attractive but 034 and 255, from which this info comes, are repeatable. Q: How do you resolve that then?

I see three options, or so:

1) Continue to segregate the info as seen in your penultimate example. 2) Confirm that, despite the fields repeatability, they are never used that way and any loss is OK. 3) Determine that 034/255 pairs represent different Instances, which has a bunch of implications. 4) Maybe there is a fourth option. Go with your 'blank node reduction' idea but if there are multiple 034/255 pairs, then segregate them.

Thoughts?

kirkhess commented 8 months ago

There's a whole bunch of ontological and conversion ideas in this section - it's hard to pin down where it goes exactly.

This whole thing with paired fields in MARC was a bad idea - display-wise they might keep the entry order but indexing isn't going to know about this - both 255$abcde and 034$bcdefgz3 are both indexed in the keyword index in WC. I think I had forgotten 007, which probably also would follow with the 034/255 pairs.

Option 3 is the one that kind of blows all this up but probably makes more sense along with option 1; I was hoping like OLAC there were some good examples gathered by magirt in a document. This isn't quite what I was looking for: Best Practices (2020) I'll keep looking.