opengeospatial / ets-csw202

Executable Test Suite for CSW 2.0.2
Other
2 stars 1 forks source link

nextRecord value when maxRecords="0" #22

Closed tomkralidis closed 6 years ago

tomkralidis commented 6 years ago

As suggested in the OGC CITE thread, opening this issue to address maxRecords=0 behaviour. Associated pycsw ticket at https://github.com/geopython/pycsw/pull/557#issuecomment-351815650 (CITE trace at https://github.com/geopython/pycsw/pull/557#issuecomment-351815650)

dstenger commented 6 years ago

Further information from mailing list:

The issue arises when maxRecords="0", which should force the same
behaviour as if resultType="hits" [v07-006r1 "OpenGIS Catalogue
Services Specification specification v2.0.2" 10.8.4.7 p149].  Looking
at what that behaviour should be [10.8.4.3 p147]:

"[...] the catalogue service shall return a < GetRecordsResponse >
element containing an empty < SearchResults > element that indicates
the estimated size of the result set. Optional attributes may or may
not be set accordingly."

When I was making my fix for the other issue, the value of nextRecord
was discussed on the PyCSW IRC channel (freenode #pycsw).  The answer
to the question, what should the nextRecord attribute be when no
records have been returned, but all records have been reported, was
nextRecord=0.  This also is the value nextRecord would take if the
last of the records had been returned. 

However, it is true that the specification does not dictate what this
value should be.  The question to the CITE forum is: is nextRecord=1
expected for some reason that I haven't found yet, or is this test
being over unnecessarily strict on this?
Dirk: the failing test is indeed 23.1.  I've posted the full trace at [1]
and opened a associated GitHub issue at [2]

Thanks

..Tom

[1] https://github.com/geopython/pycsw/pull/557#issuecomment-351815650
[2] https://github.com/opengeospatial/ets-csw202/issues/22
dstenger commented 6 years ago

I also do not see the reason why "csw:SearchResults/@nextRecord = 1" is expected.

dstenger commented 6 years ago

@lorebiga Can you please take a look at the reported behaviour?

rjaduthie commented 6 years ago

Any further on this? I'm not sure of the lifecycle of an issue in this ecosystem, but it's been 3 months since the last activity.

dstenger commented 6 years ago

@lorebiga What do you think about this?

lorebiga commented 6 years ago

I've finally been able to investigate this issue. The test query defines a result set of 11 records, of which none is returned (since maxRecords=0). Hence, the response should specify:

As clarified at p. 159, "the nextRecord attribute will indicate to the client where to begin the next query" (to iterate the result set), so:

I hope this is clear. I'm closing this issue.

By the way, I've noticed that the documentation of test tc23.1 includes the condition that numberOfRecordsMatched > 10, but the test implementation actually tests for numberOfRecordsMatched > 1. Both could be amended to align with the other. I've decided to amend the test implementation.

I've also noticed a typo in 07-006r1, §10.8.4.7, p. 149: the mention to "subclause 10.8.4.2" should instead refer to "subclause 10.8.4.3". I've reported it to the OGC staff.

rjaduthie commented 6 years ago

If you think that this is the definite answer to the question, I can patch my code to comply. To me, it still doesn't make complete sense that this operation, which returns information on all the records, would then assume that the next record to harvest is the first one in the catalogue; it would probably be safer to report next_record=0, not to then potentially mislead a client into potentially re-harvesting the whole catalogue from the start.

lorebiga commented 6 years ago

I'm not sure I understand. This operation doesn't return information on all the records, the response refers to the result set identified by the query, which may be a subset of the entire catalogue. The meaning of nextRecord=0 is clearly defined by the specification (p. 159): "a value of 0 means all records have been returned". In this case, no record has been returned, so 0 would be wrong.

rjaduthie commented 6 years ago

That's a fair point. So to be precise, this is that information on all records in the query have been returned.

If I've understood correctly, if resultType="hits" (or equivalently "maxRecords"=0) then "nextRecord" is superfluous as the response of the query doesn't 'page'. It's function when resultType="results", is to allow paging of the query. From Table66 of the spec (p159), in the description of "numberOfRecordsReturned": "Subsequent queries may be executed to see more of the result set. The nextRecord attribute will indicate to the client where to begin the next query." Is there a situation where there would be a further repeated resultType="hits" query with nextRecord=X to complete the client's task? I don't think it makes sense, but I'm willing to hear otherwise. Either way, if the response has given information on the last of the records in the query, it should not tell the client which record is next to query to complete its job.

Does this make sense? I think it boils down to: there are no more records to query, so don't set nextRecord to a natural number.

Rg.

On 13 March 2018 at 13:09, Lorenzo Bigagli notifications@github.com wrote:

I'm not sure I understand. This operation doesn't return information on all the records, the response refers to the result set identified by the query, which may be a subset of the entire catalogue. The meaning of nextRecord=0 is clearly defined by the specification (p. 159): "a value of 0 means all records have been returned". In this case, no record has been returned, so 0 would be wrong.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opengeospatial/ets-csw202/issues/22#issuecomment-372659369, or mute the thread https://github.com/notifications/unsubscribe-auth/AAU-IIQb0XV1Jtf8xXk0cQmT-HufUkJVks5td8UTgaJpZM4RCk5g .

lorebiga commented 6 years ago

The nextRecord parameter is not applicable to requests, but to responses. A client may of course repeat a query with a resultType=hits parameter. Every time, it will get no records in the response, and nextRecord=1. In general, it works like this:

This protocol was designed by the authors of the CSW 2.0.2 specification, based on the requirements collected. Afaik, it was retained as is also in CSW 3.0.

rjaduthie commented 6 years ago

Not to lose information, I'll respond in-line.

The nextRecord parameter is not applicable to requests, but to responses.*

This is not being debated. I think I know why you said it, so to be clear I should've said: "Is there a situation where there would be a further repeated resultType="hits" query with startPostion=nextRecord to complete the client's task?".

A client may of course repeat a query with a resultType=hits parameter. *

My question was: Is there a situation where there would be a further repeated resultType="hits" query [...] to complete the client's task? Does startPosition have an effect on resultType="hits"? What about "maxRecords"? If so, but I don't think it does in the PyCSW implementation, in the case where the catalogue size is 100, startPosition=10 and maxRecords=50, then would nextRecord=61 be returned with the response? What about startPosition=90 and maxRecords=50?

Every time, it will get no records in the response, and nextRecord=1. *

Yes, it should get an empty SearchResults, with attributes set to reflect how many 'hits' the query matches. What nextRecord should be is what I'm contending, of course; I believe nextRecord=1 is incorrect in this context.

In general, it works like this: the client sends a request with a query the server computes the query result (a list of 1... n records)*

Could be zero records, if the catalogue is empty for one case.

the server sends a response with some of the records (possibly 0), specifying the index of the next in the list (i.e. the first record which is not returned) in the nextRecord paramete*r

To get 0 records from a resultType="results", you either set "maxRecords" = 0, which is specifically stated to operate as a resultType="hits", or there are no records.

However, I don't see "nextRecord" = "the first record which is not returned" in the spec. I do see the description for numberOfRecordsReturned stating: "Number of records actually returned to client. This may not be the entire result set [...]. Subsequent queries may be executed to see more of the result set. The nextRecord attribute will indicate to the client where to begin the next query."

The suggestion is that, in a response, if nextRecord > 0, there are more records in the result set, from the request made, which are still to be returned and that another request is required. In the case of a resultType='hits' this is not the case; the client should be given a clue that it's finished it's job and it can relax.

This protocol was designed by the authors of the CSW 2.0.2 specification, based on the requirements collected. Afaik, it was retained as is also in CSW 3.0.*

Are you an author? If not, would it be worth contacting the authors? I would argue that this should be discussed for any future iteration of the specification, to make it crystal clear that this is the expected return value of nextRecord for resultType='hits' (or equivalent), because if nextRecord=1 for a non-paging response seems incorrect, but the current spec text doesn't convince me that it's by design. To me, it's clear that the nextRecord attribute is for use where a response does not completely fulfil the request given and used to page the records in the response and prompt further requests for the remainder of a result.

Cheers, Rg.

lorebiga commented 6 years ago

I contributed to CSW 3.0, but I don't remember a specific discussion around this very point. In general, I think that comments on approved specification should be formulated as Change Requests, which are directed to the appropriate Working Group, or may trigger the formation of a new Revision Working Group.

In fact, arguing about the rationale of CSW specification is out of scope, in this context. We are only concerned abut the correctness of existing compliance tests. This said, I will try to answer your questions :-)

- Is there a situation where there would be a further repeated resultType="hits" query [...] to complete the client's task? Perhaps a polling use-case? Anyway, we should not make assumptions about client's tasks, in this context.

- Does startPosition have an effect on resultType="hits"? What about "maxRecords"? If so, but I don't think it does in the PyCSW implementation, in the case where the catalogue size is 100, startPosition=10 and maxRecords=50, then would nextRecord=61 be returned with the response? What about startPosition=90 and maxRecords=50? The specification seems vague on this, and I don't think there's any specific test. However, note that maxRecords is the maximum number of records that should be returned. A server may return less. The nextRecord parameter may be used by the client to continue fetching the result set from the first record it hasn't got in that response.

On the syntax and semantics of nextRecord: it is mandatory, it is a non-negative integer, and the specification is very clear on the meaning of the expression nextRecord=0: "a value of 0 means all records have been returned".

So, in the specific case of the tc23.1 test, how should a server behave? It must not return any of the 11 records in the result set, since the client required maxRecords=0. But it must return a nextRecord with some value >= 0. That cannot be 0, because it would imply that all the 11 records be in the response, which instead must be empty. Could that be 2... 11? The specification does not state it clearly, but it is logical to imply that some records would be in the response, in that case, whereas it must be empty. By setting nextRecord=1, the server indicates to the client that it can continue fetching the result set from the first of the 11 records in the result set.

rjaduthie commented 6 years ago

I think that you're putting too much faith in the wording of the description of nextRecord in Table 66.

So, the wording is: "A value of 0 means all records have been returned". But take, for example, a request with resultType='results' and startPosition > 1, where the catalogue has fewer records than maxRecords. The response will have nextRecord = 0 but not "all records have been returned". The words do not reflect what is happening. I would suggest the intended meaning of the description of nextRecord is (parenthesised text my own, obviously): "A value of 0 means all records [in the result set relating to the request] have been returned." Now take the example of a request with resultType='hits', the result set is defined to be empty - i.e. zero records; so, when all records (i.e. zero records) have been returned, nextRecord should be set accordingly (i.e. nextRecord=0).

Thinking about the impact of changing this test, would it be a huge problem of backwards compatibility? I don't mind doing some of the work on this, if it helps. You advice is welcomed.

All the best, Roger

On Tue, 13 Mar 2018, 19:09 Lorenzo Bigagli, notifications@github.com wrote:

I contributed to CSW 3.0, but I don't remember a specific discussion around this very point. In general, I think that comments on approved specification should be formulated as Change Requests, which are directed to the appropriate Working Group, or may trigger the formation of a new Revision Working Group.

In fact, arguing about the rationale of CSW specification is out of scope, in this context. We are only concerned abut the correctness of existing compliance tests. This said, I will try to answer your questions :-)

- Is there a situation where there would be a further repeated resultType="hits" query [...] to complete the client's task? Perhaps a polling use-case? Anyway, we should not make assumptions about client's tasks, in this context.

- Does startPosition have an effect on resultType="hits"? What about "maxRecords"? If so, but I don't think it does in the PyCSW implementation, in the case where the catalogue size is 100, startPosition=10 and maxRecords=50, then would nextRecord=61 be returned with the response? What about startPosition=90 and maxRecords=50? The specification seems vague on this, and I don't think there's any specific test. However, note that maxRecords is the maximum number of records that should be returned. A server may return less. The nextRecord parameter may be used by the client to continue fetching the result set from the first record it hasn't got in that response.

On the syntax and semantics of nextRecord: it is mandatory, it is a non-negative integer, and the specification is very clear on the meaning of the expression nextRecord=0: "a value of 0 means all records have been returned".

So, in the specific case of the tc23.1 test, how should a server behave? It must not return any of the 11 records in the result set, since the client required maxRecords=0. But it must return a nextRecord with some value >= 0. That cannot be 0, because it would imply that all the 11 records be in the response, which instead must be empty. Could that be 2... 11? The specification does not state it clearly, but it is logical to imply that some records would be in the response, in that case, whereas it must be empty. By setting nextRecord=1, the server indicates to the client that it can continue fetching the result set from the first record in the result set.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opengeospatial/ets-csw202/issues/22#issuecomment-372783954, or mute the thread https://github.com/notifications/unsubscribe-auth/AAU-IKTpRQNHhPM7PA6fQ7P8ruLZ531Qks5teBltgaJpZM4RCk5g .

lorebiga commented 6 years ago

It's not a matter of faith: in this context, we have to stick to the wording of the specification and to the existing scripts. Tc23.1 needs no change, as far as nextRecord is concerned.

Your examples are out of scope, though meaningful and interesting. I agree with you on the meaning of nextRecord. In my opinion, the current wording is clear enough: all the parameters in a response refer to the result set related to the request. However, the change you propose is acceptable. You may submit a formal Change Request.

rjaduthie commented 6 years ago

we have to stick to the wording of the specification Which is why I've discussed the wording of the specification, which does not clearly support what you believe it to support.

Your examples are out of scope What do you mean by this? They discuss requests and responses using GetRecords within the CSW 2.0.2 specification. That's exactly in scope, surely!?

I agree with you on the meaning of nextRecord. Nice to know ;¬)

In my opinion, the current wording is clear enough: all the parameters in a response refer to the result set related to the request. Which agrees with what I said about the wording not being clear, but if you take the meaning of the wording as it is, then nextRecord=0 is the correct response for a resultType='hits' request (as there are no more records to return).

If it makes it easier to swallow, I'll put in a formal Change Request to clarify the wording in the specs to prevent cross-interpretation of the way it's been written.

Do you have any suggestions to an improved wording to suggest? I'll also give it some thought.

Rg.

On 14 March 2018 at 15:12, Lorenzo Bigagli notifications@github.com wrote:

It's not a matter of faith: in this context, we have to stick to the wording of the specification and to the existing scripts. Tc23.1 needs no change, as far as nextRecord is concerned.

Your examples are out of scope, though meaningful and interesting. I agree with you on the meaning of nextRecord. In my opinion, the current wording is clear enough: all the parameters in a response refer to the result set related to the request. However, the change you propose is acceptable. You may submit a formal Change Request.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opengeospatial/ets-csw202/issues/22#issuecomment-373057103, or mute the thread https://github.com/notifications/unsubscribe-auth/AAU-IILdGzWcJsTjv-IwLvolouBWpYPbks5teTN3gaJpZM4RCk5g .

rjaduthie commented 6 years ago

Incidentally, would there have been a faster way to get a change to the test, other than a Change Request against the spec?

On 14 March 2018 at 15:30, Roger Duthie rjaduthie@gmail.com wrote:

we have to stick to the wording of the specification Which is why I've discussed the wording of the specification, which does not clearly support what you believe it to support.

Your examples are out of scope What do you mean by this? They discuss requests and responses using GetRecords within the CSW 2.0.2 specification. That's exactly in scope, surely!?

I agree with you on the meaning of nextRecord. Nice to know ;¬)

In my opinion, the current wording is clear enough: all the parameters in a response refer to the result set related to the request. Which agrees with what I said about the wording not being clear, but if you take the meaning of the wording as it is, then nextRecord=0 is the correct response for a resultType='hits' request (as there are no more records to return).

If it makes it easier to swallow, I'll put in a formal Change Request to clarify the wording in the specs to prevent cross-interpretation of the way it's been written.

Do you have any suggestions to an improved wording to suggest? I'll also give it some thought.

Rg.

On 14 March 2018 at 15:12, Lorenzo Bigagli notifications@github.com wrote:

It's not a matter of faith: in this context, we have to stick to the wording of the specification and to the existing scripts. Tc23.1 needs no change, as far as nextRecord is concerned.

Your examples are out of scope, though meaningful and interesting. I agree with you on the meaning of nextRecord. In my opinion, the current wording is clear enough: all the parameters in a response refer to the result set related to the request. However, the change you propose is acceptable. You may submit a formal Change Request.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opengeospatial/ets-csw202/issues/22#issuecomment-373057103, or mute the thread https://github.com/notifications/unsubscribe-auth/AAU-IILdGzWcJsTjv-IwLvolouBWpYPbks5teTN3gaJpZM4RCk5g .

lorebiga commented 6 years ago

Not that I know of. Change Requests are the OGC mechanism to address cases like this. You can file it here: http://ogc.standardstracker.org/

Sorry I can't help you more on this, I'm already overcommitted with maintaining the tests... :-)

rjaduthie commented 6 years ago

I'm already overcommitted with maintaining the tests

Are you the only person working on this? Do you maintain the whole test suite or just the CSW tests?

Not that I know of.

Well, apart from fixing the test with the current wording! ;¬)

Given that the Change Request is for the benefit of making sure you're happy with what the spec says, can I run this by you. If I were to suggest the simple change to the description on table 66 for nextRecord to read: "nextRecord | Start position of next record, if the response does not contain the complete result set of the request's query; otherwise, nextRecord should be set to 0."

This covers the case where the result set naturally has no records, plus (I hope you agree) the cases where {resultType='results', maxRecords=0} and {resultType='hits'}. If you think it is still not clear enough, please suggest what would be. I have a feeling that this issue will not be as protracted if well thought out solutions are presented up front.

lorebiga commented 6 years ago

I'm the only maintainer of the CSW 2.0.2 ETS (and I maintain only this ETS).

As far as I'm concerned (i.e. these tests), I don't think the spec needs any change: I'm already happy with what it says :-) Go ahead with the Change Request as you prefer, sorry I can't help you more on this (I don't think I will be reviewing the Change Request anyway: I think it's the OAB?)

rjaduthie commented 6 years ago

Hi Lorenzo,

I'm not sure I understand your position on this. You say you agree with the argument that nextRecord is not correct when resultType='hits' (or equivalent), but won't fix the problem because of your interpretation of the currently worded spec; however, also do not think that the spec should be changed.

It also seems that you are the gatekeeper on correcting the test to reflect the correct behaviour. So, at least for me, this is a little more than frustrating.

Can you please clarify your position? Otherwise, there could be a lot of wasted effort in trying to follow your suggestion without any possible gain.

I would like to point out that I have only come across one person who thinks this test (i.e. the expected behaviour of the response) is correct and that person is yourself. Is there anyone else who maintains this test suite, from who we can get an outside opinion?

If you are trying to save some work for yourself, I am willing to put some time in myself to help. Just let me know what needs to be done. However, changing the wording in the spec won't save work, in the long run, with a lot more people-hours being put into it and the likely result that this test will have to be fixed anyway.

I look forward to hearing from you further.

Roger


From: Lorenzo Bigagli notifications@github.com Sent: 15 March 2018 13:20:52 To: opengeospatial/ets-csw202 Cc: Duthie, Roger J.A.; Manual Subject: Re: [opengeospatial/ets-csw202] nextRecord value when maxRecords="0" (#22)

I'm the only maintainer of the CSW 2.0.2 ETS (and I maintain only this ETS).

As far as I'm concerned (i.e. these tests), I don't think the spec needs any change: I'm already happy with what it says :-) Go ahead with the Change Request as you prefer, sorry I can't help you more on this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/opengeospatial/ets-csw202/issues/22#issuecomment-373373157, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAU-IBQeCPFjYwkLaq0TQ35ImO9AQ1vjks5temq0gaJpZM4RCk5g.


This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system.


lorebiga commented 6 years ago

Roger,

I think we are repeating ourselves here.

I am the maintainer of the CSW 2.0.2 ETS. You can report and refer to me on issues related to the CSW 2.0.2 ETS implementation. I am not the right recipient for arguing the rationale and the design of the CSW specification in general, nor to improve the CSW specification.

Please, don't refer to me on Change Requests to the CSW specification.

I think test tc23.1 is correctly implemented. I quote from above: the test query defines a result set of 11 records, of which none is returned (the request specifies maxRecords=0). Hence, the response should specify:

numberOfRecordsMatched="11" numberOfRecordsReturned="0"

As clarified at p. 159, "the nextRecord attribute will indicate to the client where to begin the next query" (to iterate the result set), hence:

nextRecord="1"

The CSW specification has been perused many times, this ETS has been executed hundreds, maybe thousand times, and no one has raised this issue before. I respect your opinion, but I disagree with it: I think that test tc23.1 is correctly implemented wrt nextRecord value, and there is nothing to fix.

I hope you can accept my decision and agree on closing this discussion, for the time being. @dstenger is the manager of the OGC validation tools, you may contact him for his valuable opinion on the matter.

rjaduthie commented 6 years ago

Thank you for you patience while I try to make my point clear. If I submit a Change Request, I will definitely refer to this discussion in my description, as this is the whole reason for the Change Request. I've already written a draft for the request, in fact. It doesn't have your name in it, but refers to 'test maintainers', etc.

That this issue hasn't been raised does not mean the behaviour is correct. The case in hand would have little exposure, I would expect, because in the use cases for GetRecords requests with resultType='hits' or equivalent, you would normally ignore the nextRecord attribute (... because you have all the records you expect to have!!! ;¬) ). So, the only people who are going to be concerned about these things are the implementation devs, who are not numerous. I expect a lot of little issues are ignored because of barriers to resolution, reading the spec, understanding it, etc. It's probably easier to change your code to match the test suite. You should definitely not do things the way they have been done before, if they have been done incorrectly.

Yes, at danger of repeating ourselves:

As clarified at p. 159, "the nextRecord attribute will indicate to the client where to begin the next query" (to iterate the result set), hence: nextRecord="1"

To me, the above statement is contradictory to your point: the response doesn't need to iterate the result set, it has all the records it asked for (i.e 0).

It's maybe too late to convince you, but it's worth noting for the discussion that the [currently worded] spec also states in the "maxRecords attribute" section: If [the maxRecords attribute] value is set to zero, then the behaviour is identical to setting "resultType=hits" as described in subclause 10.8.4.2. i.e. the request is made expecting zero records returned and is only concerned with the "numberOfRecordsMatched" attribute to get the number of records that the query matches.

You don't seem be compromising on this, so the next step would be to create the Change Request. The purpose of the Change Request would be to simply to clarify that in the case where no records were requested (or even that no records are present in the query result) and no records are returned, the nextRecord value doesn't tell the client to start the next query at record number 1. Instead, it should be crystal clear that the nextRecord attribute should be set to 0, to signify that the records included in the response (i.e. 0 records) fulfils the entirety of the request.

I am still certain that this is what the current wording of the spec actually says. Maybe @dstenger has an outside opinion to offer?

rjaduthie commented 6 years ago

I have now filled out a change request for clarification on this issue in the CSW spec: http://ogc.standardstracker.org/show_request.cgi?id=547