Closed pandzel-zz closed 9 years ago
According to Table 6 in OGC 12-176r6, the "text search" KVP query includes the "q" parameter. Let's call this "simple text search". It is described in the spec by a single sentence:
"Comma separated list of search terms that are used to search all text fields in a catalogue record."
And that's it. It does seem to be somewhat underspecified. There are a couple of things to note here:
Note: OGC 10-032r8 (OpenSearch Geo and Time Extensions) recommends restricting search terms to three record fields: dc:title, dc:description, and dc:subject.
Hopefully the next revision of the spec will clarify these matters. Following the principle of least astonishment, the test suite currently assumes that a simple text search is interpreted somewhat like a Google search:
Should the CSW3 spec clarify the expected behavior in the next revision, the corresponding tests will be updated accordingly. Please consider submitting a CR to the SWG--several implementers have been kicking around ideas.
The q
parameter should probably be reserved for specifying a "simple" text search. The AnyText
pseudo-property could be used to express more sophisticated full-text queries. However, the spec is silent about this common queryable, so maybe it's intended as a vendor-specific extension point (the general model doesn't have anything to say about it either).
I think we all agree the spec is open for interpretation regarding the q
parameter. We suggest to not enforce one particular interpretation as you describe above. The comparison to Google search does not solve this, as Google has a lot more happening than the exact match/implicit AND you mention.
Case in point: search Google for hottentotten tenten tentoonstelling and the first results show hottentottententententoonstelling (which happens to be a valid Dutch word that could be made significantly longer if desired). this search would probably fail the test.
In the meantime, I recommend we drop this specific test for the CSW 3.0 specification since it breaks on catalogs implementing common search engine behaviors mentioned above. Those catalogs would have to implement specific behaviors just to pass this test that do not benefit the users.
@rjmartell, since the spec is not very clear, can we make the test more general in a way that it makes sense instead of removing it?
As written, the basic text search facility in the candidate CSW3 spec offers no guidance whatsoever to implementers and thus is, strictly speaking, untestable. However, if the intent of the spec authors is to permit implementers to do whatever they want, then I would agree it doesn't belong in a conformance test suite.
In the interest of clarifying the expected behavior of "basic" text search, the test suite imposes two requirements (as noted above). But if implementers cannot agree on how such a query should be processed then it should be expunged from the spec altogether.
An alternative is to just check for a non-empty result set and ignore the actual content of the matching records (whether the answer is correct or not).
Since CSW is an interface, the test in my view should focus on whether the source implements the interface correctly. whether the answer is correct, or even useful, is beyond the scope of that interface (and hopefully the implementers actually make sure their catalogs return results relevant to the question...).
With GetRecords what is the structure of the response that is expected? that is something that could be tested fairly unambiguously.
I'm hard pressed to imagine that the content of the response is irrelevant when attempting to verify that an interface has been implemented correctly. However, in this case most implementers seem to agree that the spec is too ambiguous to serve as an authoritative test "oracle".
So, pending clarification in the final spec the tests for simple keyword searches will ignore the content of the records that are purported to satisfy the query. The result set, however, is expected to be non-empty.
There are several tests which belong to some sort of general family of tests which I would call: "term search". It could be a single term search or multiple term search and it may apply to either CSW GetRecords request with "q" parameter or OpenSearch request using "searchTerm" parameter. Basically, all such tests are created in such a way that search term (or terms) isused to search the catalog, then a term (or terms) is expected to be present in the body of each returned record. There are several reasons why, in my opinion, this expectation is wrong: