Add status check before parsing the json string

This is related to the issue https://github.com/nlmatics/llmsherpa/issues/64.

The root cause of this issue is that the client does not check the response status and try to parse the non-json content returned from the server. Since this status error is not directly shown to users, users only see the json parsing exception.

With this patch, users will see the raw content from the server

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File <timed exec>:5

File ~/miniconda3/envs/py11/lib/python3.11/site-packages/llmsherpa/readers/file_reader.py:73, in LayoutPDFReader.read_pdf(self, path_or_url, contents)
     71 parser_response = self._parse_pdf(pdf_file)
     72 if parser_response.status > 200:
---> 73     raise ValueError(f"{[parser_response.data](http://parser_response.data/)}")
     74 response_json = json.loads(parser_response.data.decode("utf-8"))
     75 blocks = response_json['return_dict']['result']['blocks']

ValueError: b'<html>\r\n<head><title>403 Forbidden</title></head>\r\n<body>\r\n<center><h1>403 Forbidden</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n'

Instead of the implicit one:

--------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
File <timed exec>:5

File ~/miniconda3/envs/py11/lib/python3.11/site-packages/llmsherpa/readers/file_reader.py:72, in LayoutPDFReader.read_pdf(self, path_or_url, contents)
     70             pdf_file = (file_name, file_data, 'application/pdf')
     71 parser_response = self._parse_pdf(pdf_file)
---> 72 response_json = json.loads(parser_response.data.decode("utf-8"))
     73 blocks = response_json['return_dict']['result']['blocks']
     74 return Document(blocks)

File ~/miniconda3/envs/py11/lib/python3.11/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    341     s = s.decode(detect_encoding(s), 'surrogatepass')
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:
    348     cls = JSONDecoder

File ~/miniconda3/envs/py11/lib/python3.11/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    332 def decode(self, s, _w=WHITESPACE.match):
    333     """Return the Python representation of ``s`` (a ``str`` instance
    334     containing a JSON document).
    335 
    336     """
--> 337     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338     end = _w(s, end).end()
    339     if end != len(s):

File ~/miniconda3/envs/py11/lib/python3.11/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
    353     obj, end = self.scan_once(s, idx)
    354 except StopIteration as err:
--> 355     raise JSONDecodeError("Expecting value", s, err.value) from None
    356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

nlmatics / llmsherpa

Add status check before parsing the json string #74