opengeospatial / ets-csw202

Executable Test Suite for CSW 2.0.2
Other
2 stars 1 forks source link

CITE CSW 202 not parsing extended capabilities #2

Closed bermud closed 9 years ago

bermud commented 10 years ago

It seems that the tests for CSW202 have troubles parsing the following CSW 2.0.2 capabilities document, which are valid, according to Altova XMLSpy 2013 and Stylus Studio 2011:

http://inspire-geoportal.ec.europa.eu/GeoportalProxyWebServices/resources/OGC CSW202?service=CSW&version=2.0.2&request=GetCapabilities

I believe this is the cause why in the attached logs it sasys that the reponse body is empty, which is not.

Log for test s0003/d1e6546_1/d1e6408_1/d1e383_1 Test csw:csw-2.0.2-GetCapabilities-tc1.1 (s0003/d1e6546_1/d1e6408_1/d1e383_1) Assertion: All OGC web services must implement GetCapabilities using the GET method. The response to a GetCapabilities request without the optional version parameter must include a complete representation of the capabilities document corresponding to the latest supported version.

Request d1e466_1: Method: GET URL: [http://inspire-geoportal.ec.europa.eu/GeoportalProxyWebServices/resou rces/OGCCSW202?service=CSW&request=GetCapabilities]()

Response from parser p:XMLValidatingParser.CSW: \ Messages from parser p:XMLValidatingParser.CSW:**

Validation error: cvc-elt.4.2: Cannot resolve 'inspire_common:citationInspireInteroperabilityRegulation_eng' to a type definition for element 'inspire_common:Specification'.

Validation error: cvc-elt.4.2: Cannot resolve 'inspire_common:classificationOfSpatialDataService' to a type definition for element 'inspire_common:MandatoryKeyword'.

Validation error: cvc-elt.4.2: Cannot resolve 'inspire_common:inspireTheme_eng' to a type definition for element 'inspire_common:Keyword'.

3 validation errors detected.

Message d1e490_1: FAILURE: Missing response entity. Result: Failed

I have attached the log files downloaded via your form.

Link: http://inspire-geoportal.ec.europa.eu/GeoportalProxyWebServices/resources/OGCCSW202?service=CSW&version=2.0.2&request=GetCapabilities

Originally reported in the OGC CITE issue tracker: issue #864

Entered by: Quaglia, Angelo

Opened: 2013-09-05 07:58:48 Last Updated: 2013-11-18 12:39:07

bermud commented 10 years ago

The error message is a bit misleading here. It really means that the response entity fails to validate. The capabilities document is using @xsi:type attributes and the parser is configured to validate against the standard CSW schemas. So the referenced types (in the "http://inspire.ec.europa.eu/schemas/common/1.0" namespace) cannot be found.

If the xsi:type attributes are removed it should pass since ows:ExtendedCapabilities is of type xsd:anyType. The second xsi:schemaLocation attribute on the inspire_ds:ExtendedCapabilities element will be ignored.

Note: It may be possible to configure the validator to ignore xsi:type. See ignore-xsi-type-until-elemdecl.

Entered By: Martell, Richard - 2013-09-18 17:44:30

bermud commented 10 years ago

Richard,

I just did not see your email, I am sorry.

The document is considered valid by XMLSpy and StylusStudio and also by the INSPIRE Geoportal Validator which has a Java implementation.

It is valid also for the online validator http://www.validome.org/xml/validate/ as long as I add to the root element:

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd"

I have taken care of keeping all inspire references confined inside the element of type xsd:anyType so that you do not have to be aware of specific xml schemas used inside the Extended Capabilities element by INSPIRE or by any other provider.

You, as the reader of the document, are instead supposed to decide which schema definition to apply for OGC namespaces.

I think you have to fix the issue on your side.

Best regards,

Angelo

Entered By: Quaglia, Angelo - 2013-09-19 09:34:31

bermud commented 10 years ago

Many thanks for looking into this.

I understand that not everybody likes xsi:type but it has proved quite useful to me and it is a legal construct. Of course, at the time when I decided to introduce it, I tested it with the most popular xml tools (XMLSpy , Stylus Studio, Saxon libraries, JAXB). I have never encountered any issue with those tools, so I hope it will be possible to support it in CITE, as well. By the way, the INSPIRE Geoportal Validator has to deal with similar idiosyncrasies, which are sometimes even worse like for example ESRI ArcGIS customizations of WMS capabilities and I had to resort to filtering techniques but as of today I have managed to handle those cases successfully and transparently for service providers. in addition to what I wrote Friday, I would like to point out that the use of xsi:type in the INSPIRE Geoportal schemas is limited to constrain element content and not document structure.

For example, for English content:

<Conformity>
   <Specification xsi:type="citationInspireInteroperabilityRegulation_eng">
      <Title>Commission Regulation (EU) No 1089/2010 of 23 November 2010 implementing Directive 2007/2/EC of the European Parliament and of the Council as regards interoperability of spatial data sets and services</Title>
      <DateOfPublication>2010-12-08</DateOfPublication>
      <URI>OJ:L:2010:323:0011:0102:EN:PDF</URI>
      <ResourceLocator>
         <URL>http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2010:323:0011:0102:EN:PDF</URL>
         <MediaType>application/pdf</MediaType>
      </ResourceLocator>
   </Specification>
   <Degree>notEvaluated</Degree>
</Conformity>

While here is the equivalent in French:

<Conformity>
   <Specification xsi:type="citationInspireInteroperabilityRegulation_fre">
      <Title>Règlement (UE) n o 1089/2010 de la Commission du 23 novembre 2010 portant modalités d&apos;application de la directive 2007/2/CE du Parlement européen et du Conseil en ce qui concerne l&apos;interopérabilité des séries et des services de données géographiques</Title>
      <DateOfPublication>2010-12-08</DateOfPublication>
      <URI>OJ:L:2010:323:0011:0102:FR:PDF</URI>
      <ResourceLocator>
         <URL>http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2010:323:0011:0102:FR:PDF</URL>
         <MediaType>application/pdf</MediaType>
      </ResourceLocator>
   </Specification>
   <Degree>notEvaluated</Degree>
</Conformity>

Entered By: Quaglia, Angelo - 2013-09-23 12:16:45

bermud commented 10 years ago

As Richard notes, the CSW validator used in the suite is configured to validate against the CSW 2.0.2 schema. It ignores any “hints” supplied in the capabilities document, so the @xsi:type references cannot be resolved.

However, as the ows:ExtendedCapabilities element has type xsd:anyType, its attribute processContent defaults to strict, whose semantics is "the XML processor must obtain the schema for the required namespaces and validate the elements".

Hence, I think the parser should be configured to honor the provided schemalocation.

However, it should also be enforced that the official schemas be used for known namespaces, such as ows and csw.

I am discussing this issue with the TEAM Engine maintainer.

Entered By: Bigagli, Lorenzo - 2013-10-03 10:21:31

bermud commented 10 years ago

Dear Lorenzo and Richard,

I think that you could as a first step, filter out the Extended Capabilities since quite often the schemaLocation hint points to files that are behind firewalls or on local filesystems.

I am not sure that everybody will be willing or in a condition to make their schema files available.

I think it is incorrect behavior that CITE tests on standard OGC elements fail because of whatever failure in the analysis of the Extended Capabilities.

Best regards,

Angelo

Entered By: Quaglia, Angelo - 2013-10-03 10:53:01

bermud commented 10 years ago

I agree with Angelo: just remove the ows:ExtendedCapabilities element before validation occurs.

Entered By: Martell, Richard - 2013-10-03 15:15:41

bermud commented 10 years ago

The assertion and comment in csw:csw-2.0.2-GetCapabilities-tc1.1 are clear:

  All OGC web services must implement GetCapabilities using the GET method. 

  The response to a GetCapabilities request without the optional version 

  parameter must include a complete representation of the capabilities 

  document corresponding to the latest supported version.

  Pass if the response is schema valid and has no missing elements.

Please note OGC 05-008c1, §7.4.2 (p.19):

A service metadata document shall be the normal response to a client from performing the GetCapabilities operation, and shall contain metadata appropriate to the specific server for the specific OWS. [...] That service metadata document shall be encoded in XML, and shall use XML Schemas to specify the correct document contents and organization.

Hence, we must schema-validate the complete Capabilities document.

As the ows:ExtendedCapabilities element has type xsd:anyType, its attribute processContent defaults to strict, whose semantics is "the XML processor must obtain the schema for the required namespaces and validate the elements".

Note that if an organization is not "willing or in a condition to make their schema files available", than the test must fail.

In conclusion, the parser should be configured to honor the schemalocation of external schemas (and fail when not provided).

On the other hand, it should enforce the official schemas for known namespaces, such as ows and csw.

Entered By: Bigagli, Lorenzo - 2013-10-04 04:49:31

bermud commented 10 years ago

Lorenzo, I think I did not make myself clear enough.

My technical recommendation is to apply current CITE tests on the document where the extended capabilities element has been removed.

You can then analyze the extended capabilities element in a separate step without unduly polluting all the other test results.

Entered By: Quaglia, Angelo - 2013-10-04 05:27:43

bermud commented 10 years ago

Hi,

any news?

Entered By: Quaglia, Angelo - 2013-11-04 08:07:51

bermud commented 10 years ago

We are discussing the fix on the CITE-dev mailing list. You are welcome to subscribe there for direct interaction.

We will report here when we reach a conclusions.

Entered By: Bigagli, Lorenzo - 2013-11-04 08:17:26

bermud commented 10 years ago

Thanks.

According to the mailing list archives, the latest update was Tue Oct 15 12:20:45 EDT 2013

In what ways should I interact?

I have already stated my arguments here.

Is there planned date for the resolution?

Angelo

Entered By: Quaglia, Angelo - 2013-11-05 09:07:53

bermud commented 10 years ago

Rich and I will convene this friday late afternoon on skype to address the issue and possibly fix it.

You are welcome to join us.

Entered By: Bigagli, Lorenzo - 2013-11-05 12:27:21

bermud commented 10 years ago

Getting Xerces (2.11.0) to validate using schema location “hints” in the instance document is straightforward: just create a schema object without specifying a schema source.

Schema schema = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI).newSchema(); Validator validator = schema.newValidator(); // Note: "honour-all-schemaLocations" feature is still false but the validator behaves as expected validator.setFeature(IGNORE_XSI_TYPE, true); // Doesn't ignore xsi:type="rim:FreeFormText" and reports an invalid type derivation

Note that setting the feature “ignore-xsi-type-until-elemdecl” didn’t cause that attribute to be ignored, presumably because a global element declaration had been found. So I’m not sure how this feature is supposed to work.

Now, if instead I construct a validator using a specific schema, and then set the feature “honour-all-schemaLocations” on the validator to be true, it appears to ignore the hint and only uses the given schema; the result is a complaint about not finding a declaration for the document element. So it appears that with the standard JAXP API it is not possible to both supply a schema and use location hints. Do one or the other. Perhaps this might be possible using an internal Xerces API, the so-called Xerces native interface.

Entered By: Martell, Richard - 2013-11-18 12:39:07

rjmartell commented 9 years ago

In the script CSW-GetCapabilities-GET.xml, almost every test is using a validating parser pre-configured with a particular set of standard schemas; these parsers are defined in common.xml.

Note that the underlying Java class is com.occamlab.te.parsers.XMLValidatingParser. Perhaps it might be possible to allow an empty schema_links parameter so as to indicate that the validator should attempt to use schema location hints provided in the instance document.

lorebiga commented 9 years ago

Given the above, I would propose this solution for this issue:

1-prune the content of extendedCapabilities (may be empty) and validate the document with the official schemas 2-if the content of extendedCapabilities is not empty, validate the whole document using the user-specified location hints

Fixing issue #64 would support the above strategy.

bermud commented 9 years ago

@lorebiga, I like your strategy.

rjmartell commented 9 years ago

In teamegine-core-4.0.6, XMLValidatingParser will no longer raise an error if no schema references are supplied; in this case it will attempt to use the xsi:schemaLocation "hints" given in the instance document.

lorebiga commented 9 years ago

Release note: the test requires that the capabilities document contains all the schema locations necessary for validation.