Closed iliion closed 10 months ago
I think you want max_items=5
. limit
comes from the STAC API spec and
controls the number of items per page.
On Tue, Nov 21, 2023 at 9:06 AM iliion @.***> wrote:
pystac_client version: 0.7.5
I am performing the following simple request to get some items from a catalog and this ends up in an infinite loop (?).
from pystac_client import Client import datetime
def main(): catalog = Client.open(url='https://earth-search.aws.element84.com/v1/') my_search = catalog.search(collections='cop-dem-glo-30', limit = 5) print(my_search.url_with_parameters())
prints out ->
https://earth-search.aws.element84.com/v1/search?limit=5&collections=cop-dem-glo-30
https://earth-search.aws.element84.com/v1/search?limit=5&collections=cop-dem-glo-30for item in my_search.items(): print(item)
if name == 'main': main()
In the above example I would just expect to the api to return 5 items per page. What I get instead are multiple requests of the following https://earth-search.aws.element84.com/v1/search?limit=5&collections=cop-dem-glo-30 . In addtion if the results are less than the limit imposed, then the api will keep returning repeatedly the same items (and not necessarilty in the same order).
— Reply to this email directly, view it on GitHub https://github.com/stac-utils/pystac-client/issues/617 or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIRLDRJZTYWJTAX733DYFS7PTBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJLJONZXKZNENZQW2ZNLORUHEZLBMRPXI6LQMWBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTLDTOVRGUZLDORPXI6LQMWSUS43TOVS2M5DPOBUWG44SQKSHI6LQMWVHEZLQN5ZWS5DPOJ42K5TBNR2WLKJTGQZTSOJQGUYTTAVEOR4XAZNFNFZXG5LFUV3GC3DVMWVDEMBQGQ2DSMRSGM22O5DSNFTWOZLSUZRXEZLBORSQ . You are receiving this email because you are subscribed to this thread.
Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .
Tom is correct, if you only want to return five items, use max_items
. A couple of other things:
In the above example I would just expect to the api to return 5 items per page.
It should, but to check this you need to:
for page in my_search.pages_as_dicts():
print(len(page))
In this line:
print(my_search.url_with_parameters())
During paging, the search object is not updated with the paging parameters, so url_with_parameters
will not change while paging. See https://github.com/stac-utils/pystac-client/blob/4ea6dac3a4cc817854e8fbcb1a9f041f079655b1/pystac_client/stac_api_io.py#L282-L312 for the relevant code.
Ok I understand that the search request will return all pages and the limit
will be the size of the each page and I get the number of items in each page from print(len(page['features']))
My problem is that the requests will go on infinitely when I ran the above example in my catalog. I understand that this is a bug on my part but I cant understand the reason. Maybe you have a clue why the requests from the client wont stop. Do i miss something in the api specification?
FYI: The api response follows the specs here (https://api.stacspec.org/v1.0.0/item-search/#tag/Item-Search)
I think I know what is wrong. stac_client does not support paging implemented with page=x
parameter.
For the following request http://localhost:20008/search?limit=2&collections=test-collection
The rel
=next
link will have this href
-> http://localhost:20008/search?limit=2&collections=test-collection&page=1
Unfortunately the above url is parsed and the output is the following
{
"rel":"next",
"type":"application/json",
"method":"POST",
"href":"http://localhost:20008/search",
"body":{
"limit":2,
"collections":[
"test-collection"
],
"token":1
}
}
Unfortunately the above url is parsed and the output is the following
I don't quite know what you mean by this. The read_text
method doesn't make any assumptions about pagination -- it simply uses what the server returns: https://github.com/stac-utils/pystac-client/blob/4ea6dac3a4cc817854e8fbcb1a9f041f079655b1/pystac_client/stac_api_io.py#L128-L172
To continue debugging, can you provide the following:
My guess was read_json()
I will try to be more clear.
http://localhost:20008/search?limit=2&collections=test-collection
will output a response where the next
link is like this:
{
"rel":"next",
"type":"application/json",
"method":"GET",
"href":"http://localhost:20008/search?limit=1&collections=test-collection&page=1"
}
If I run the following and print the response then I get something different
catalog = Client.open(url='http://localhost:20008')
my_search = catalog.search(collections='test-collection', limit = 1)
for page in my_search.pages_as_dicts():
print(my_search.url_with_parameters())
# -> http://localhost:20008/search?limit=1&collections=test-collection
print(page['links'])
The page['links'] will output a response where the next
link is this:
{
"rel":"next",
"type":"application/json",
"method":"POST",
"href":"http://localhost:20008/search",
"body":{
"limit":2,
"collections":[
"test-collection"
],
"token":1
}
}
The point is that the loop will not stop
. . .
REQUEST 0
DEBUG:pystac_client.stac_api_io:POST http://localhost:20008/search Headers: {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '60', 'Content-Type': 'application/json'} Payload: {"limit": 1, "collections": ["test-collection"], "token": 1}
send: b'POST /search HTTP/1.1\r\nHost: localhost:20008\r\nUser-Agent: python-requests/2.31.0\r\nAccept-Encoding: gzip, deflate, br\r\nAccept: */*\r\nConnection: keep-alive\r\nContent-Length: 60\r\nContent-Type: application/json\r\n\r\n'
send: b'{"limit": 1, "collections": ["test-collection"], "token": 1}'
reply: 'HTTP/1.1 200 OK\r\n'
header: date: Wed, 22 Nov 2023 16:17:30 GMT
header: server: uvicorn
header: content-length: 1509
header: content-type: application/geo+json
header: content-encoding: br
header: vary: Accept-Encoding
DEBUG:urllib3.connectionpool:http://localhost:20008 "POST /search HTTP/1.1" 200 1509
REQUEST 1
DEBUG:pystac_client.stac_api_io:POST http://localhost:20008/search Headers: {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '48', 'Content-Type': 'application/json'} Payload: {"limit": 1, "collections": ["test-collection"]}
send: b'POST /search HTTP/1.1\r\nHost: localhost:20008\r\nUser-Agent: python-requests/2.31.0\r\nAccept-Encoding: gzip, deflate, br\r\nAccept: */*\r\nConnection: keep-alive\r\nContent-Length: 48\r\nContent-Type: application/json\r\n\r\n'
send: b'{"limit": 1, "collections": ["test-collection"]}'
reply: 'HTTP/1.1 200 OK\r\n'
header: date: Wed, 22 Nov 2023 16:17:33 GMT
header: server: uvicorn
header: content-length: 1509
header: content-type: application/geo+json
header: content-encoding: br
header: vary: Accept-Encoding
DEBUG:urllib3.connectionpool:http://localhost:20008 "POST /search HTTP/1.1" 200 1509
<Item id=test-item-1>
DEBUG:pystac_client.stac_api_io:POST http://localhost:20008/search Headers: {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '60', 'Content-Type': 'application/json'} Payload: {"limit": 1, "collections": ["test-collection"], "token": 1}
send: b'POST /search HTTP/1.1\r\nHost: localhost:20008\r\nUser-Agent: python-requests/2.31.0\r\nAccept-Encoding: gzip, deflate, br\r\nAccept: */*\r\nConnection: keep-alive\r\nContent-Length: 60\r\nContent-Type: application/json\r\n\r\n'
send: b'{"limit": 1, "collections": ["test-collection"], "token": 1}'
reply: 'HTTP/1.1 200 OK\r\n'
header: date: Wed, 22 Nov 2023 16:17:33 GMT
header: server: uvicorn
header: content-length: 1509
header: content-type: application/geo+json
header: content-encoding: br
header: vary: Accept-Encoding
DEBUG:urllib3.connectionpool:http://localhost:20008 "POST /search HTTP/1.1" 200 1509
<Item id=test-item-1>
.. .. .. (infinite loop).. .. ..
This is a problem with your server. pages_as_dicts
does not modify the links
attribute in any way: https://github.com/stac-utils/pystac-client/blob/4ea6dac3a4cc817854e8fbcb1a9f041f079655b1/pystac_client/item_search.py#L725-L749
Closing as not-an-issue-with-pystac-client, please re-open if you find otherwise.
pystac_client version: 0.7.5
I am performing the following simple request to get some items from a catalog and this ends up in an infinite loop (?).
In the above example I would just expect to the api to return 5 items per page. What I get instead are multiple requests of the following
https://earth-search.aws.element84.com/v1/search?limit=5&collections=cop-dem-glo-30
. In addtion if the results are less than the limit imposed, then the api will keep returning repeatedly the same items (and not necessarilty in the same order).