metafacture / metafacture-core

Core package of the Metafacture tool suite for metadata processing.
https://metafacture.org
Apache License 2.0
71 stars 34 forks source link

Make namespace check for decode-xml optional #545

Open TobiasNx opened 3 months ago

TobiasNx commented 3 months ago

https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7Cdecode-xml%0A%7Chandle-generic-xml%0A%7Cencode-yaml%0A%7Cprint%0A%3B&data=%3C%3Fxml+version%3D%221.0%22%3F%3E%0A%3Crecord%3E%0A++++%3Cmets%3Afield%3Ea%3C/mets%3Afield%3E%0A++++%3Cfield%3Eb%3C/field%3E%0A%3C/record%3E

When reading a xml with namespaces but no corresponding namespace definition MF breaks. This should be optional.

dr0i commented 3 months ago

Why do you think this is a bug? The problem relies in the using of a namespace without defining one. I.e. the error lies in the input data.

TobiasNx commented 3 months ago

Why do you think this is a bug? The problem relies in the using of a namespace without defining one. I.e. the error lies in the input data.

You are right, but you cannot handle this data at all with MF. I would not say this is a bug, but to make this optional would be a feature. This would be similar to the option ignoreId from handle-marcxml

dr0i commented 3 months ago

There is always a way - until it isn't. Could you not filter the whole input first (treat it as a string) and use a regex that would remove the erroneous namespace?

dr0i commented 1 month ago

This is a way: https://metafacture.org/playground/?flux=inputFile%0A%7Copen-file%0A%7Cas-records%0A%7Cmatch%28pattern%3D%22mets%3A%22%2C+replacement%3D%22%22%29%0A%7Cread-string%0A%7Cdecode-xml%0A%7Chandle-generic-xml%0A%7Cencode-yaml%0A%7Cprint%0A%3B&data=%3C%3Fxml+version%3D%221.0%22%3F%3E%0A%3Crecord%3E%0A++++%3Cmets%3Afield%3Ea%3C/mets%3Afield%3E%0A++++%3Cfield%3Eb%3C/field%3E%0A%3C/record%3E