the-library-code / dspace-rest-python

DSpace REST API Client Library
BSD 3-Clause "New" or "Revised" License
25 stars 21 forks source link

Missing User-Agent headers breaks Cloudfront (was: requests.exceptions.JSONDecodeError ... (perhaps wrong api_endpoint ?)) #10

Closed abubelinha closed 10 months ago

abubelinha commented 10 months ago

Got error within the first lines of example.py:

My code:

def dspace_rest_python_example():
    """ Taken from https://github.com/the-library-code/dspace-rest-python/blob/main/example.py
        """
    from dspace_rest_client.client import DSpaceClient
    from dspace_rest_client.models import Community, Collection, Item, Bundle, Bitstream

    # Example variables needed for authentication and basic API requests
    # SET THESE TO MATCH YOUR TEST SYSTEM BEFORE RUNNING THE EXAMPLE SCRIPT
    url = 'https://demo.dspace.org/server/api'
    username = 'myusername'
    password = 'mypassword'

    # Instantiate DSpace client
    d = DSpaceClient(api_endpoint=url, username=username, password=password)

    # Authenticate against the DSpace client
    authenticated = d.authenticate()
    if not authenticated:
        print(f'Error logging in! Giving up.')
        exit(1)
    else:
        print('Login successful !!')

if __name__ == "__main__":
    dspace_rest_python_example()

My output:

C:\Python38\python test_dspace.py

Traceback (most recent call last):
  File "C:\Python38\lib\site-packages\requests\models.py", line 971, in json
    return complexjson.loads(self.text, **kwargs)
  File "C:\Python38\lib\site-packages\simplejson\__init__.py", line 525, in loads
    return _default_decoder.decode(s)
  File "C:\Python38\lib\site-packages\simplejson\decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "C:\Python38\lib\site-packages\simplejson\decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "dvd_dspace.py", line 65, in <module>
    dspace_rest_python_example()
  File "dvd_dspace.py", line 56, in dspace_rest_python_example
    authenticated = d.authenticate()
  File "C:\Python38\lib\site-packages\dspace_rest_client\client.py", line 116, in authenti
    r_json = r.json()
  File "C:\Python38\lib\site-packages\requests\models.py", line 975, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Tried it in two different machines, same error:

I could login to the demo server (https://demo.dspace.org/) using the same credentials, so they must be correct. Actually these are provided in the top of that page.

As the script does not reach the 'if not authenticated' line , maybe the api_endpoint url value I am using is just wrong?

I figured the url out looking at the end of this page, but I may have misunderstood: https://wiki.lyrasis.org/display/DSDOC7x/REST+API

If that is the case, I would like to know what url should I use.

Thanks a lot in advance @abubelinha

kshepherd commented 10 months ago

Hi @abubelinha , thanks for logging this issue. I am looking into it - trying to figure out if 7.6.1 has some differences or if its due to some unexpected setup in the demo site (other servers are working ok). The fact that the 403s I'm seeing are HTML rather than JSON responses is interesting...

I also like the suggestions around endpoint and version validation, I will create some issues to describe these requirements

abubelinha commented 10 months ago

Thanks @kshepherd From your reply I should understand the url I used is the correct one, isn't it?

Do you know of other open demo servers I could use to try this package until this issue is solved?

Unfortunately my institution repository is an old DSpace version (and it doesn't have a demo testing server, anyway). I wish I could use this package to encourage them to upgrade it.

Off topic (would better fit into repository discussions): Does anyone know examples of DSpace api being used (python or not) to autopublish thousands of museum pieces as individual DSpace items? I.e. having a handle.net url for a pdf commenting each painting in a museum, or things like that. (of course, providing the institution has a database containing all info needed to produce the documents).

Thanks a lot for your help Edit: sorry I closed the issue by mistake and reopened

kshepherd commented 10 months ago

@abubelinha

Yes your base URL looks good, it just seems that either something is different about that site, or 7.6.1 (the latest version) changed an authentication response.

For the most control over a demonstration server, of whatever version you want to try, I would recommend using the docker images (see: https://wiki.lyrasis.org/display/DSPACE/Try+out+DSpace+7#TryoutDSpace7-InstallviaDocker)

If I run up any public demo servers myself I will be sure to let you know.

Regarding the other question - there are definitely scenarios like this, though usually for digital derivatives of the pieces themselves - I don't know if I've heard of a handle or DOI for a physical item but at the very least it could resolve to a catalogue entry with any digitised objects (photos, recordings, etc). I would recommend the dspace-community mailing list (https://groups.google.com/g/dspace-community) or code4lib community (https://code4lib.org/).

kshepherd commented 10 months ago

@abubelinha Good news! @misilot had a suggestion which helped to solve the problem -- there is a Cloudfront layer in front of the official demo server, and it was filtering out requests based on which user agent was set. Please see my new example.py script for usage of the new fake_user_agent argument to the client instantiation.

Something like this should now work, as at version 0.1.9 on PyPI (https://pypi.org/project/dspace-rest-client/0.1.9/) or the git tag here: https://github.com/the-library-code/dspace-rest-python/tree/dspace-rest-client-0.1.9

Thanks for reporting this issue, it will definitely help other users in the future who might have API endpoints behind other WAF proxies like Cloudfront.