hbz / lobid-resources

Transformation, web frontend, and API for the hbz catalog as LOD
http://lobid.org/resources
Eclipse Public License 2.0
8 stars 7 forks source link

ALMA test xmls are not valid MARCXML #1348

Open TobiasNx opened 2 years ago

TobiasNx commented 2 years ago

See here

The problem is, that the namespace is not provided in the marc xml files: https://github.com/hbz/lobid-resources/blob/b1b8a24d6958026a2adebf1ff607a6ec9dd664aa/src/test/resources/alma

It s: <record>

Should be: <record xmlns="http://www.loc.gov/MARC21/slim">

If there is a<collection>element as in https://github.com/hbz/lobid-resources/blob/4fbc7cfa76792bc5730f19a17f2292d24ab515ac/src/test/resources/alma/almaMarcXmlTestFiles.xml.tar.bz2 then the namespace should include the namespace refrence not the <record> element. It s: <collection><record>

Should be: <collection xmlns="http://www.loc.gov/MARC21/slim"><record>

Because of this the playground as well as other Flux-Scripts do not recognize the files without the namespace as marc-xml.

dr0i commented 2 years ago

But our ETL is working with it. What looks your flux like? In java its: FileOpener->XmlDecoder->MarcXmlHandler ...

dr0i commented 2 years ago

ah, uh - and use marcXmlHandler.setNamespace(null); - this prevents an ns-check. In flux, use | handle-marcxml(namespace=null). Try that. May be it's not working because the null must be a null and not a string "null", so that this workaround doesn't work with flux. Then you could also try | handle-marcxml(namespace="").

TobiasNx commented 2 years ago

Thanks. But the option behaviour seems odd and is not documented: https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.md#handle-marcxml

This should somehow be documented since normally this would be "tru"/"false"

blackwinter commented 2 years ago

But the option behaviour [...] is not documented

Of course it's documented:

options: namespace (String), attributemarker (String)

You can either set it to null as @dr0i suggested (if that works in Flux) in order to disable the namespace check or set it to the required namespace value ("http://www.loc.gov/MARC21/slim") in order make the check pass (see metafacture/metafacture-core#331).

normally this would be "tru[e]"/"false"

This only applies to boolean options.

blackwinter commented 2 years ago

But the option behaviour [...] is not documented

Oh, did you mean the part about disabling the namespace check? Then you're right, that's currently not documented. Sorry if I misunderstood.

dr0i commented 1 year ago

@TobiasNx can you document it?

TobiasNx commented 1 year ago

https://github.com/metafacture/metafacture-core/pull/504