jjjake / internetarchive

A Python and Command-Line Interface to Archive.org
GNU Affero General Public License v3.0
1.62k stars 219 forks source link

Unhandled exception if the IA response is a HTML 403 page rather than a JSON response #656

Open msikma opened 3 weeks ago

msikma commented 3 weeks ago

This is a minor issue that I think only occurred because the IA is currently in process of getting the site back up.

When running ia metadata 'win95-logo.sys' (link: https://archive.org/details/win95-logo.sys), the following unhandled exception occurs:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 975, in json
    return complexjson.loads(self.text, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.9/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.9/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.9/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/ia", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/internetarchive/cli/ia.py", line 171, in main
    sys.exit(ia_module.main(argv, session))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/internetarchive/cli/ia_metadata.py", line 203, in main
    item = session.get_item(identifier)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/internetarchive/session.py", line 253, in get_item
    item_metadata = self.get_metadata(identifier, request_kwargs) or {}
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/internetarchive/session.py", line 284, in get_metadata
    return resp.json()
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 979, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Logging the response reveals that the site sends an HTML response: https://gist.github.com/msikma/faa97e6509ec88754c325a66eb935650

The page itself states:

Item not available The item is not available due to issues with the item's content.

This is probably a rare case that will vanish when things are back online properly, but it might be nice to handle the case.

jjjake commented 2 weeks ago

Thanks @msikma. Yes, this would be nice to handle.