wmo-im / wcmp

WMO Core Metadata Profile
https://github.com/wmo-im/wcmp
6 stars 6 forks source link

test/assess/provide feedback on pywcmp #94

Open tomkralidis opened 3 years ago

tomkralidis commented 3 years ago

Summary and Purpose Gather feedback from GISCs when using pywcmp to quality assess WIS Metadata.

Proposal Guide DWD colleagues to download, install, and test the current available pywmcp

Reason Feedback from GISCs will help harden and improve pywcmp over time

Initially assigned to @jsieland; we may want to expand to other GISCs as well.

tomkralidis commented 3 years ago

Update (TT-WISMD 2021-04-09): expanding for all of TT-WISMD to test and provide feedback.

jsieland commented 3 years ago

I got some feedback from my GISC-DWD colleagues: Installation worked fine and from first impression the output for KPI testing seems very helpful.

Unfortunately their test with a single XML-file ran into an error for KPI-7:

FILE

/home/wis/myScripts/venv/pywcmp/bin/pywcmp kpi validate --file /home/wis/repository/bulgaria_wisof_importer/urn%3Ax-wmo%3Amd%3Aint.wmo.wis%3A%3AUSBU01LZSO.xml -v INFO

-------- Output: ---------------
[2021-04-21T09:30:05Z] INFO - Validating file /home/wis/repository/bulgaria_wisof_importer/urn%3Ax-wmo%3Amd%3Aint.wmo.wis%3A%3AUSBU01LZSO.xml
Validating file /home/wis/repository/bulgaria_wisof_importer/urn%3Ax-wmo%3Amd%3Aint.wmo.wis%3A%3AUSBU01LZSO.xml
[2021-04-21T09:30:05Z] INFO - Evaluating KPIs: ['kpi_001', 'kpi_002', 'kpi_003', 'kpi_004', 'kpi_005', 'kpi_006', 'kpi_007', 'kpi_008', 'kpi_009', 'kpi_010', 'kpi_011', 'kpi_012']
[2021-04-21T09:30:05Z] INFO - Running KPI-1: WCMP 1.3, Part 2 Compliance
[2021-04-21T09:30:05Z] INFO - Running KPI-2: Good quality title
[2021-04-21T09:30:05Z] INFO - Running KPI-3: Good quality abstract
[2021-04-21T09:30:05Z] INFO - Running KPI-4: Temporal information
[2021-04-21T09:30:05Z] INFO - Running KPI-5: WMOEssential data links
[2021-04-21T09:30:05Z] INFO - Running KPI-6: Keywords
[2021-04-21T09:30:05Z] INFO - Running KPI-7: Graphic overview for non bulletins metadata records
Traceback (most recent call last):
  File "/home/wis/myScripts/venv/pywcmp/bin/pywcmp", line 11, in <module>
    load_entry_point('pywcmp==0.3.dev0', 'console_scripts', 'pywcmp')()
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/pywcmp-0.3.dev0-py3.6.egg/pywcmp/kpi.py", line 1029, in validate
    kpis_results = kpis.evaluate(kpi)
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/pywcmp-0.3.dev0-py3.6.egg/pywcmp/kpi.py", line 954, in evaluate
    result = getattr(self, kpi)()
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/pywcmp-0.3.dev0-py3.6.egg/pywcmp/kpi.py", line 532, in kpi_007
    graphic_overviews = self.exml.xpath(xpath, namespaces=self.namespaces)
  File "src/lxml/etree.pyx", line 2296, in lxml.etree._ElementTree.xpath
  File "src/lxml/xpath.pxi", line 357, in lxml.etree.XPathDocumentEvaluator.__call__
  File "src/lxml/xpath.pxi", line 225, in lxml.etree._XPathEvaluatorBase._handle_result
lxml.etree.XPathEvalError: Undefined namespace prefix

and also

URL

/home/wis/myScripts/venv/pywcmp/bin/pywcmp kpi validate --url https://oai.dwd.de/oai/provider?verb=GetRecord&metadataPrefix=iso19139&identifier=urn:x-wmo:md:int.wmo.wis::USBU01LZSO

----------- Output: -------------------------
[1] 15075
[2] 15076
[2]+  Done                    metadataPrefix=iso19139
bash: wis@mst1:/home/wis/myScripts/venv/pywcmp/pywcmp/pywcmp> Traceback (most recent call last):
  File "/home/wis/myScripts/venv/pywcmp/bin/pywcmp", line 11, in <module>
    load_entry_point('pywcmp==0.3.dev0', 'console_scripts', 'pywcmp')()
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/pywcmp-0.3.dev0-py3.6.egg/pywcmp/kpi.py", line 1029, in validate
    kpis_results = kpis.evaluate(kpi)
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/pywcmp-0.3.dev0-py3.6.egg/pywcmp/kpi.py", line 954, in evaluate
    result = getattr(self, kpi)()
  File "/home/wis/myScripts/venv/pywcmp/lib64/python3.6/site-packages/pywcmp-0.3.dev0-py3.6.egg/pywcmp/kpi.py", line 195, in kpi_002
    titles = self.exml.xpath(xpath, namespaces=self.namespaces)
  File "src/lxml/etree.pyx", line 2293, in lxml.etree._ElementTree.xpath
  File "src/lxml/xpath.pxi", line 325, in lxml.etree.XPathDocumentEvaluator.__init__
  File "src/lxml/xpath.pxi", line 259, in lxml.etree.XPathElementEvaluator.__init__
  File "src/lxml/xpath.pxi", line 131, in lxml.etree._XPathEvaluatorBase.__init__
  File "src/lxml/xpath.pxi", line 55, in lxml.etree._XPathContext.__init__
  File "src/lxml/extensions.pxi", line 81, in lxml.etree._BaseContext.__init__
TypeError: empty namespace prefix is not supported in XPath

The gmd:graphicOverview is missing in the XML file (https://oai.dwd.de/oai/provider?verb=GetRecord&metadataPrefix=iso19139&identifier=urn:x-wmo:md:int.wmo.wis::USBU01LZSO) so it seems there might be either a problem with the KPI validation or a non-catched error in the metadata.

Additional but not technical related: My colleague noted that it is not clear if the KPI testing is recommended or just for testing purposes or else. So perhaps adding a sentence for clarification in the pywcmp repo would be helpful.

tomkralidis commented 3 years ago

Thanks for the valuable feedback @jsieland. Comments:

Command line arguments that have special characters in them should typically be enclosed in quotes/double quotes:

/home/wis/myScripts/venv/pywcmp/bin/pywcmp kpi validate --file "/home/wis/repository/bulgaria_wisof_importer/urn%3Ax-wmo%3Amd%3Aint.wmo.wis%3A%3AUSBU01LZSO.xml" -v INFO

/home/wis/myScripts/venv/pywcmp/bin/pywcmp kpi validate --url "https://oai.dwd.de/oai/provider?verb=GetRecord&metadataPrefix=iso19139&identifier=urn:x-wmo:md:int.wmo.wis::USBU01LZSO"

Having said this, when running these commands I can indeed reproduce the errors.

In the case of the XML file at https://oai.dwd.de/oai/provider?verb=GetRecord&metadataPrefix=iso19139&identifier=urn:x-wmo:md:int.wmo.wis::USBU01LZSO, this is an OAI-PMH response that contains a WCMP record. The root element of a WCMP document must be gmd:MD_Metadata. I've opened an issue to better handle such cases in https://github.com/wmo-im/pywcmp/issues/35, fixes in https://github.com/wmo-im/pywcmp/pull/36, and will let you know when to update your installation to continue testing.

Regarding:

My colleague noted that it is not clear if the KPI testing is recommended or just for testing purposes or else. So perhaps adding a sentence for clarification in the pywcmp repo would be helpful.

Good question. We might want to discuss targets for GISCs overall which would help clarify; @efucile / @amilan17 thoughts?

tomkralidis commented 3 years ago

@jsieland FYI fixes now applied to master branch for testing/validation.

jsieland commented 3 years ago

Thanks @tomkralidis

My colleague added the KPI validation to their test system. Some error output messages aren't displayed properly yet but here is a first impression on KPI validation for a larger amount of data:

https://oai-test.dwd.de/oaimonitorgui/NEW/validateMD.jsp#tabs-validateMD_kpi

josusky commented 3 years ago

Thanks Julia,

this is a terrific amount of test data. I have briefly skimmed through it, but it is definitely worth a more detailed analysis. For now, one thing caught my eye - the link validation failed also for URLs that are correct (and work when I check the metadata XML file on my PC). Perhaps there is some firewall that prevents the pywcmp tool in your environment to make HTTP requests. Which, on the second thought, makes some sense.

JanO