the-library-code / dspace-rest-python

DSpace REST API Client Library
BSD 3-Clause "New" or "Revised" License
24 stars 19 forks source link

Improve search workflow #17

Open alanorth opened 3 weeks ago

alanorth commented 3 weeks ago

The search_objects function is tricky to use because there is no way to know how many pages of results there are. I came up with this, managing the page variable in a loop and waiting until the number of results is less than 20 (default size):

#!/usr/bin/env python3

from dspace_rest_client.client import DSpaceClient
from dspace_rest_client.models import Item, Bundle, Bitstream

url = 'https://demo.dspace.org/server/api'

d = DSpaceClient(api_endpoint=url)

page = 0
items = d.search_objects(dso_type='item', page=page)

while True:
    page += 1

    print(f"Fetching page: {page}")

    new_items = d.search_objects(dso_type='item', page=page)

    items = items + new_items

    if len(new_items) < 20:
        break

print(f"> Total: {len(items)}")

Is there a better way to do this with the current state of the client? Thanks!

kshepherd commented 3 weeks ago

Thanks @alanorth , there are definitely improvements to be made here.

The way i wrote that was to return a simple list of the objects themselves... what we should probably do instead is implement the full search query and response as per the REST contract, which would include this page info in the embedded search results object:

      "page": {
        "number": 0,
        "size": 20,
        "totalPages": 35,
        "totalElements": 696
      },

So I will look at making this the "proper" way to search, handling the full query and filter stuff for the POST and returning an object properly modelled on the response.

alanorth commented 3 weeks ago

Thanks @kshepherd. This library has already helped me by reducing some boilerplate code when interacting with the DSpace 7 REST API. Your proposed improvements sound good.