metafacture / metafacture-core

Core package of the Metafacture tool suite for metadata processing.
https://metafacture.org
Apache License 2.0
69 stars 34 forks source link

Counted leader elements in marc when encoding to marc #524

Closed TobiasNx closed 2 weeks ago

TobiasNx commented 2 months ago

@maipet hinted that encode-marc21 or encode-marcxml cannot create the leader correctly since the elements are not counted. Could you elaborate the problem

blackwinter commented 2 months ago

Is this related to #454?

TobiasNx commented 2 months ago

I am not sure. We are transforming the OERSI JSON Data to Marc, but @maipet told me about invalid results created by the transformation due to the missing leader elements that state e.g. the length of a record.

But @maipet could clarify.

maipet commented 2 months ago

you can set the leader field, but leader "Character Positions 00-04 - Record length" & "Pos. 12-16 - Base address of data" should actually be generated automatically? It was discussed with @dr0i that we should first check whether the marc records from OERSI are 'valid' even without the correct information in the leader (the positions are currently filled with zeros).

TobiasNx commented 2 months ago

While inspecting some workaround for #454, I saw that the marc21-encoder seems to have a mechanism for that:

https://metafacture.org/playground/?flux=%22https%3A//d-nb.info/1106253078/about/marcxml%22%0A%7C+open-http%28accept%3D%22application/xml%22%29%0A%7C+decode-xml%0A%7C+handle-marcxml%0A%7C+fix%28transformationFile%29%0A%7C+encode-marc21%0A%7C+decode-marc21%28emitLeaderAsWhole%3D%22true%22%29%0A%7C+encode-yaml%0A%7C+print%0A%3B&transformation=copy_field%28%22leader%22%2C%22@leader.status%22%29%0Acopy_field%28%22leader%22%2C%22@leader.type%22%29%0Acopy_field%28%22leader%22%2C%22@leader.bibliographicLevel%22%29%0Acopy_field%28%22leader%22%2C%22@leader.typeOfControl%22%29%0Acopy_field%28%22leader%22%2C%22@leader.characterCodingScheme%22%29%0Acopy_field%28%22leader%22%2C%22@leader.encodingLevel%22%29%0Acopy_field%28%22leader%22%2C%22@leader.catalogingForm%22%29%0Acopy_field%28%22leader%22%2C%22@leader.multipartLevel%22%29%0A%0Asubstring%28%22@leader.status%22%2C%225%22%2C%221%22%29%0Asubstring%28%22@leader.type%22%2C%226%22%2C%221%22%29%0Asubstring%28%22@leader.bibliographicLevel%22%2C%227%22%2C%221%22%29%0Asubstring%28%22@leader.typeOfControl%22%2C%228%22%2C%221%22%29%0Asubstring%28%22@leader.characterCodingScheme%22%2C%229%22%2C%221%22%29%0Asubstring%28%22@leader.encodingLevel%22%2C%2217%22%2C%221%22%29%0Asubstring%28%22@leader.catalogingForm%22%2C%2218%22%2C%221%22%29%0Asubstring%28%22@leader.multipartLevel%22%2C%2219%22%2C%221%22%29%0A%0Amove_field%28%22@leader%22%2C%22leader%22%29

Someone more advanced should have a look to confirm. Probably we could reuse the parts of the encode-marc21 for encode-marcxml

dr0i commented 2 months ago

The construction of the leader (counting bytes including indicators etc magic) is done through invoking Marc21Decoder.java (which calls the Record.java) . Code can be reused for encode-marc21 - although it's ugly from a performance point of view (the whole record has to be made into tpye Record at the end of the parsing of a record). This will be done in my PR treating https://github.com/metafacture/metafacture-core/issues/454.

TobiasNx commented 2 months ago

Code can be reused for encode-marc21

@dr0i: Isn't encode-marc21 already doing this? See: https://github.com/metafacture/metafacture-core/issues/524#issuecomment-2056344931

dr0i commented 2 months ago

Functional review @TobiasNx and @maipet . Deployed to test-Plaground metafacture-framework feature-454-allowMarc21EncoderToGetLeaderAsOneString-SNAPSHOT.

Note that the generated leader is 02934naa a2200649uc 4500 while the original input was <leader>00000naa a2200000uc 4500</leader>. So the leader seems to be correct (record size and also other parts, while the type etc. is preserved...)

TobiasNx commented 2 months ago

Added my review here: https://github.com/metafacture/metafacture-core/pull/526#issuecomment-2068705357

On scenario is still not working otherwise for me this seems to work. But @maipet has more knowledge about the leader.

TobiasNx commented 2 months ago

It seems that this is not solved for encode-marcxml. The leader position at the beginning and in the middle are still 00000

TobiasNx commented 2 months ago

Ahhh I now see what the problem here is, encode-marcxml still lacks the ability to generate the counted leader info. I did not review this properly, sorry.

TobiasNx commented 2 months ago

We decided with @maipet and @dr0i that marcXML does not need to count but either use the provided leader info if the leader is provided as whole (even if the record itself changed) or set the Leader Pos 00-04 and 12-16 to zero if the leader is only provided in separated elements as it is done by decode-marc21.

For further info see: https://github.com/metafacture/metafacture-core/issues/527#issuecomment-2076585889