metafacture / metafacture-core

Core package of the Metafacture tool suite for metadata processing.
https://metafacture.org
Apache License 2.0
69 stars 34 forks source link

Marc-XML-encoder: record-type written as controlfield not as attribut of record-field #402

Closed TobiasNx closed 2 years ago

TobiasNx commented 2 years ago

While reviewing #336 I saw that the optional attribute type of the record field is also wrongly outputted as controlledfield:

in:

<?xml version="1.0" encoding="UTF-8"?>
  <record xmlns="http://www.loc.gov/MARC21/slim" type="Bibliographic">
    <leader>00000pam a2200000 c 4500</leader>
    <controlfield tag="001">1196308691</controlfield>
    <controlfield tag="003">DE-101</controlfield>
    <controlfield tag="005">20210312081326.0</controlfield>
    <controlfield tag="007">tu</controlfield>
    <controlfield tag="008">191003s2019    gw ||||| |||| 00||||ger  </controlfield>
    ...

out:

<marc:record>
        <marc:controlfield tag="type">Bibliographic</marc:controlfield>
        <marc:leader>00000pam a2200000 c 4500</marc:leader>
        <marc:controlfield tag="001">1196308691</marc:controlfield>
        <marc:controlfield tag="003">DE-101</marc:controlfield>
        <marc:controlfield tag="005">20210312081326.0</marc:controlfield>
        <marc:controlfield tag="007">tu</marc:controlfield>
        <marc:controlfield tag="008">191003s2019    gw ||||| |||| 00||||ger</marc:controlfield>

Never mind the namespaces in this ticket.

blackwinter commented 2 years ago

This was addressed by #394: xmlHandler.setAttributeMarker("~") or handle-*-xml(attributeMarker="~") should do the trick (see also #379).

blackwinter commented 2 years ago

Oh, I see, the MARC XML encoder doesn't know how to handle (marked) attributes. Is that it?

dr0i commented 2 years ago

Yep, good catch, that's it!

TobiasNx commented 2 years ago

Oh, I see, the MARC XML encoder doesn't know how to handle (marked) attributes. Is that it?

type= is an optional attribute of record(See: https://www.loc.gov/standards/marcxml/xml/spy/spy.html#element_record_Link044CBAB8). The encoder passes it as controlled field:

<record xmlns="http://www.loc.gov/MARC21/slim" type="Bibliographic">

=>

<marc:record>
<marc:controlfield tag="type">Bibliographic</marc:controlfield>

type is set as tag in controlfield tag and the attribute value is set in the bracket.

Using the attribute marker the output is:

    <marc:record>
        <marc:controlfield tag="~type">Bibliographic</marc:controlfield>

Should be: ` I think.

Neither marked nor not marked works.

TobiasNx commented 2 years ago

I have a testcase here:

https://github.com/TobiasNx/notWorkingFlux/tree/main/leader It was originally intended for the leader-problem. But since this is fixed it shows the other problem too:

Testflux: https://github.com/TobiasNx/notWorkingFlux/blob/main/leader/misplacedLeader.flux

Testfile: https://github.com/TobiasNx/notWorkingFlux/blob/main/leader/1196308691_marcxml.mrcx

blackwinter commented 2 years ago

Should be fixed by #404. Could you check?

TobiasNx commented 2 years ago

if the attributeMarker is explicitly set like this

handle-marcxml(attributeMarker="~")|
morph(FLUX_DIR + "allNested.xml")|
encode-marcxml(attributeMarker="~")|

result is: <marc:record type="Bibliographic">

it works, if not the option is not set explicitly it doesn't.

Is there anyway that we do not need this explicit marking. Since all other attributes of marcXML e.g. code= or ind1=:

        <marc:datafield tag="016" ind1="7" ind2=" ">
            <marc:subfield code="2">DE-101</marc:subfield>
            <marc:subfield code="a">1196308691</marc:subfield>
        </marc:datafield>

are recognized directly and ecoded properly. In this it differs from a generic XML decoder/encoder since the possible attributes are limited.

Is this what you mean by dirty vs clean solution @blackwinter ?

blackwinter commented 2 years ago

Is this what you mean by dirty vs clean solution @blackwinter ?

No, this is just for backward compatibility. Although, if we consider it a bugfix, we can justify changing the default behaviour. Thoughts?

TobiasNx commented 2 years ago

But then the new leader behaviour is also not backwards compatible. Isn't it.

blackwinter commented 2 years ago

But then the new leader behaviour is also not backwards compatible. Isn't it.

Exactly. This was a bugfix.

blackwinter commented 2 years ago

And to be clear: I'm sure we can make a case for this to be a bug, too. I was probably still too much in the mindset of the previous attribute work ;)

blackwinter commented 2 years ago

Proposing #405 as bugfix. WDYT?

TobiasNx commented 2 years ago

Nice. Bugfix seems fine. +1.


input:

  <record xmlns="http://www.loc.gov/MARC21/slim" type="Bibliographic">
    <leader>00000pam a2200000 c 4500</leader>
    <controlfield tag="001">1196308691</controlfield>

was:

    <marc:record>
        <marc:controlfield tag="type">Bibliographic</marc:controlfield>
        <marc:leader>00000pam a2200000 c 4500</marc:leader>

now:

    <marc:record type="Bibliographic">
        <marc:leader>00000pam a2200000 c 4500</marc:leader>

Also don't be confused about the namespace refrences in record at input. encode-marcxml always outputs collections and the namespace refrence is in the collection-tag e.g.<marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">. If this should be an feature to choose output at record or at collection level, we should open another ticket.

Also the namespace-prefixes in the output are okay since encode-marcxml always does it like this. see #403 .

dr0i commented 2 years ago

So I think we agree on having this fixed :) Closing.

blackwinter commented 2 years ago

So I think we agree on having this fixed :)

But the pull request isn't merged, yet?

dr0i commented 2 years ago

Sorry, thought I had, but forgot to push. Done now.