metno / discovery-metadata-catalog-ingestor

Apache License 2.0
1 stars 1 forks source link

DMCI not catching XML parsing error in /validate. #185

Closed magnarem closed 1 year ago

magnarem commented 1 year ago

During the last catalog rebuilding, I found one xml file that had wrong xml syntax, missing a / in an end-tag.

So xml-files that do not have proper xml-syntax will make dmci throw lxml.etree.XMLSyntaxError, witch is not properly handeled.

log output:

[2023-06-01 07:19:33,033]                 dmci.api.app:1414 ERROR    Exception on /v1/validate [POST]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 2190, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 1486, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 1484, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 1469, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/usr/local/lib/python3.10/dist-packages/dmci/api/app.py", line 98, in post_validate
    msg, code = self._validate_method_post(request)
  File "/usr/local/lib/python3.10/dist-packages/dmci/api/app.py", line 181, in _validate_method_post
    valid, msg, data = worker.validate(data)
  File "/usr/local/lib/python3.10/dist-packages/dmci/api/worker.py", line 91, in validate
    valid = self._xsd_obj.validate(etree.fromstring(data))
  File "src/lxml/etree.pyx", line 3252, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1913, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1800, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1141, in lxml.etree._BaseParser._parseDoc
  File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
  File "<string>", line 62
lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: east line 60 and rectangle, line 62, column 21

Offending file:

mmd-xml-dev/arch_8/arch_e/arch_3/e636c3e8-1714-4cd3-9f51-caae0125ab1c.xml. fixed in fix_dataset_e636c3e8-1714-4cd3-9f51-caae0125ab1c PR in mmd-xml-dev.

magnarem commented 1 year ago

<mmd:east>90.21793210403746<mmd:east> changed to <mmd:east>90.21793210403746</mmd:east>