python / cpython

The Python programming language
https://www.python.org
Other
63.39k stars 30.36k forks source link

JSONDecodeError verbose error mesage #121479

Open ArielRotem1 opened 4 months ago

ArielRotem1 commented 4 months ago

Feature or enhancement

Proposal:

Example is the best way to start:

import json

try:
    raw_json_dict = b"{'a': 1, {'b': 2 'c': 3}}"
    letter_to_index = json.loads(raw_json_dict)
except JSONDecodeError as error:
   print(error.verbose())

# Printing => verbose error:"'b': 2 'c': 3}"

The problem: Currently, when getting a JSONDecodeError you can't fully understand where the issue is because of unindecative error message. The message only gives you the location of the error but not what actually caused the error itself. For example, when you try to json.loads() a response from a server and you don't know what the content will be. If you got a JSONDecodeError and you didn't log or save the raw response fron the server, you won't be able to figure out what caused the error and fix it.

The solution First, I am not here to propose one unique solution, there could be multiple possible solutions. I just want to present the simple one to show its impact.

The solution I proposed in the example makes use of new function inside the JSONDecodeError class. The verbose function prints the actual characters in which the error occurred and print a range of chars around them to get a context of the error. Furthermore, the function will be able to get a parameter which will defines the range of characters around the error to get back.

I would very like to hear your opinions of this feature. Thanks!

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

gaogaotiantian commented 4 months ago

I believe this is something that should be posted in https://discuss.python.org/c/ideas/6 first. There are a lot of things to be discussed:

The idea should be mature and concrete (preferably discussed) before presented in issues :)

nineteendo commented 4 months ago

What a coincidence! I've implemented this a few days ago in my fork of the json library:

def _get_err_context(doc: str, pos: int) -> tuple[int, str]:
    line_start: int = doc.rfind("\n", 0, pos) + 1
    if (line_end := doc.find("\n", pos)) == -1:
        line_end = len(doc)

    max_chars: int = get_terminal_size().columns - 4  # leading spaces
    min_start: int = min(line_end - max_chars, pos - max_chars // 2)
    max_end: int = max(line_start + max_chars, pos + max_chars // 2)
    text_start: int = max(min_start, line_start)
    text_end: int = min(line_end, max_end)
    text: str = doc[text_start:text_end]
    if text_start > line_start:
        text = "..." + text[3:]

    if text_end < line_end:
        text = text[:-3] + "..."

    offset: int = pos - text_start + 1
    return offset, text

class JSONDecodeError(SyntaxError):
    """JSON decode error."""

    def __init__(self: Self, msg: str, doc: str, pos: int, filename: str = "<string>") -> None:
        """Create new JSON decode error."""
        lineno: int = doc.count("\n", 0, pos) + 1
        self.colno: int = pos - doc.rfind("\n", 0, pos)
        offset, text = _get_err_context(doc, pos)
        super().__init__(msg, (filename, lineno, offset, text))

    def __str__(self: Self) -> str:
        return f"{self.msg} (file {self.filename}, line {self.lineno:d}, column {self.colno: d})"

Result:

>>> import jsonc
>>> jsonc.loads("[,]")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/storage/emulated/0/Download/pyvz2-alpha/pyvz2/jsonc/__init__.py", line 124, in loads
    return JSONDecoder(allow=allow).decode(s, filename=filename)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/storage/emulated/0/Download/pyvz2-alpha/pyvz2/jsonc/decoder.py", line 324, in decode
    raise JSONSyntaxError(msg, filename, s, err.value) from None
  File "<string>", line 1
    [,]
     ^
jsonc.scanner.JSONSyntaxError: Expecting value

Ideally the column number should be included in the traceback (and the truncation should happen there):

https://github.com/python/cpython/blob/8ad6067bd4556afddc86004f8e350aa672fda217/Lib/traceback.py#L1269

https://github.com/python/cpython/blob/8ad6067bd4556afddc86004f8e350aa672fda217/Lib/traceback.py#L1278

Then it would look even better:

$ python pyvz2 json
[,]

  File '<stdin>', line 1, column 2
    [,]
     ^
jsonc.scanner.JSONSyntaxError: Expecting value

I plan to discuss this in more detail in September.

ArielRotem1 commented 4 months ago

@gaogaotiantian Thank you for letting me know! It makes more sense to go through reviews of the idea and solidify it.

ArielRotem1 commented 4 months ago

@nineteendo Wow! Nice to hear! It helps to prove that the idea has a meaningful usage. I would create a post to discuss the idea with the community. I would love to hear there your support and what you did and why.

nineteendo commented 4 months ago

Sure, but I can't post on Discourse... Can I summarize here?

ArielRotem1 commented 4 months ago

Sure, you can post here. I can send a link there to your comment. But of course it would be better if you could post there.

nineteendo commented 4 months ago

(I'm banned)

During a rewrite, I ran into an encoding issue that was impossible to figure out with the current error message:

>>> import json
>>> "{}".encode("utf_16") # redirect stdout
b'\xff\xfe{\x00}\x00'
>>> b'\xff\xfe{\x00}\x00'.decode("cp1252") # redirect stdin
'ÿþ{\x00}\x00'
>>> json.loads('ÿþ{\x00}\x00')
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I had to resort to print() for debugging. After fixing it, I had the idea to display it like syntax errors:

jsonc.scanner.JSONSyntaxError: Expecting value
  File "C:\Users\wanne\empty.json", line 1, column 1
    ÿþ{
    ^

You can even open the file at the location of the error by hovering over the filename in VS Code! It was relatively easy to derive the filename from the open file and pass it around:

def load(fp):
    return loads(fp.read(), filename=getattr(fp, "name", "<string>"))

def loads(s, *, filename="<string>"):
    if not filename.startswith("<") and not filename.endswith(">"):
        filename = realpath(filename)

Unlike python files, lines in a JSON file can get very long, making the error hard to read:

jsonc.scanner.JSONSyntaxError: Expecting ':' delimiter
  File '<stdin>', line 1, column 371
    {"glossary": {"title": "example glossary", "GlossDiv": {"title": "S", "G
lossList": {"GlossEntry": {"ID": "SGML", "SortAs": "SGML", "GlossTerm": "Sta
ndard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1
986", "GlossDef": {"para": "A meta-markup language, used to create markup la
nguages such as DocBook.", "GlossSeeAlso": ["GML", "XML"]}, "GlossSee"}}}}}

                                                                      ^

So I tried only displaying 40 characters before and after:

jsonc.scanner.JSONSyntaxError: Expecting ':' delimiter
  File '<stdin>', line 1, column 371
    ...SeeAlso": ["GML", "XML"]}, "GlossSee"}}}}}
                                            ^

But that left a lot of unused space, so I made it dynamic:

jsonc.scanner.JSONSyntaxError: Expecting ':' delimiter
  File '<stdin>', line 1, column 371
    ...s such as DocBook.", "GlossSeeAlso": ["GML", "XML"]}, "GlossSee"}}}}}
                                                                       ^

It didn't look quite right in tracebacks, so I inherited from SyntaxError:

File "<string>", line 1

    ^
jsonc.scanner.JSONSyntaxError: Expecting value

Sadly, we lose the column information now, so traceback should be modified. But other than that, it looks great.

nineteendo commented 3 months ago

You can install jsonyx for testing purposes: https://pypi.org/project/jsonyx It can be run from the command line with python -m jsonyx.tool.

IMPORTANT: Don't use it in production code, the API is unstable and is insufficiently tested.

nineteendo commented 3 months ago

As stated previously, you get the information for free in tracebacks:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.12/site-packages/jsonyx/__init__.py", line 247, in loads
    return JSONDecoder(allow=allow).loads(s, filename=filename)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jsonyx/__init__.py", line 121, in loads
    return self._scanner(filename, s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 1
    [,]
     ^
jsonyx._decoder.JSONSyntaxError: Expecting value

And you can format the exception like this:

>>> import jsonyx, traceback
>>> exc = jsonyx.JSONSyntaxError("Expecting value", "<string>", "[,]", 1)
>>> print(*traceback.format_exception_only(exc), sep="", end="")
  File "<string>", line 1
    [,]
     ^
jsonyx._decoder.JSONSyntaxError: Expecting value
nineteendo commented 3 months ago

@elisbyberi, what do you think of my approach? Imagine if syntax errors looked like this:

>>> [,]
SyntaxError: invalid syntax: line 1 column 2 (char 2)

Instead of this:

>>> [,]
  File "<stdin>", line 1
    [,]
     ^
SyntaxError: invalid syntax

Ideally, answer on the Discourse thread, I can't post there.

nineteendo commented 3 months ago

Could you show Chris what the simple approach could look like for indented JSON in a terminal?

  File '<stdin>', line 1, column 371
    {"glossary": {"title": "example glossary", "GlossDiv": {"title": "S", "G
lossList": {"GlossEntry": {"ID": "SGML", "SortAs": "SGML", "GlossTerm": "Sta
ndard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1
986", "GlossDef": {"para": "A meta-markup language, used to create markup la
nguages such as DocBook.", "GlossSeeAlso": ["GML", "XML"]}, "GlossSee"}}}}}

                                                                      ^
jsonc.scanner.JSONSyntaxError: Expecting ':' delimiter
nineteendo commented 3 months ago

Also, it's worth noting that including the filename in the error message can be extremely useful.

nineteendo commented 3 months ago

I'm going to try to underline more than one character if applicable.

ArielRotem1 commented 3 months ago

Could you show Chris what the simple approach could look like for indented JSON in a terminal?

  File '<stdin>', line 1, column 371
    {"glossary": {"title": "example glossary", "GlossDiv": {"title": "S", "G
lossList": {"GlossEntry": {"ID": "SGML", "SortAs": "SGML", "GlossTerm": "Sta
ndard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1
986", "GlossDef": {"para": "A meta-markup language, used to create markup la
nguages such as DocBook.", "GlossSeeAlso": ["GML", "XML"]}, "GlossSee"}}}}}

                                                                      ^
jsonc.scanner.JSONSyntaxError: Expecting ':' delimiter

Why the caret signaling where the error is so far from the data itself? Additionally, the caret seems to change location according to the screen size of the device you're using.

nineteendo commented 3 months ago

There are 374 spaces before the caret:

  File '<stdin>', line 1, column 371
    {"glossary": {"title": "example glossary", "GlossDiv": {"title": "S", "GlossList": {"GlossEntry": {"ID": "SGML", "SortAs": "SGML", "GlossTerm": "Standard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1986", "GlossDef": {"para": "A meta-markup language, used to create markup languages such as DocBook.", "GlossSeeAlso": ["GML", "XML"]}, "GlossSee"}}}}}
                                                                                                                                                                                                                                                                                                                                                                                      ^
jsonyx._decoder.JSONSyntaxError: Expecting ':' delimiter

Which get wrapped in the terminal (which is dependent of the screen size):

Screenshot 2024-07-18 at 18 12 30

Instead of truncating the text:

Screenshot 2024-07-18 at 18 13 28
nineteendo commented 3 months ago

Could you mention jsonyx in the thread? I'm getting a little annoyed about problems being mentioned that are already handled by it. Especially the "Expecting value" is quite undescriptive.

ArielRotem1 commented 3 months ago

Yes, I will do it, but it is pretty inconveniet to handle a conversation like this. So if you can do anything to join the fourm again it would be better.

nineteendo commented 3 months ago

Using my second GitHub account is a very bad idea, and the chance of getting unbanned early is low. The library should suffice for the discussion.

nineteendo commented 3 months ago

I can now underline multiple characters:

Screenshot 2024-07-19 at 08 32 19
nineteendo commented 3 months ago

Here's the code to get the error context, supporting multiple characters (which provides useful information in error messages):

def _get_err_context(doc: str, start: int, end: int) -> tuple[int, str, int]:
    line_start: int = doc.rfind("\n", 0, start) + 1
    if (line_end := doc.find("\n", start)) == -1:
        line_end = len(doc)

    end = min(max(start + 1, line_end), end)
    max_chars: int = get_terminal_size().columns - 4  # leading spaces
    text_start: int = max(min(
        line_end - max_chars, end - max_chars // 2 - 1, start - max_chars // 3,
    ), line_start)
    text_end: int = min(max(
        line_start + max_chars, start + max_chars // 2 + 1,
        end + max_chars // 3,
    ), line_end)
    text: str = doc[text_start:text_end].expandtabs(1)
    if text_start > line_start:
        text = "..." + text[3:]

    if len(text) > max_chars:
        end -= len(text) - max_chars
        text = text[:max_chars // 2 - 1] + "..." + text[-max_chars // 2 + 2:]

    if text_end < line_end:
        text = text[:-3] + "..."

    return start - text_start + 1, text, end - text_start + 1

This isn't trivial at all. (not sure if it's even correct)

nineteendo commented 3 months ago

I can now also mention the end line and end column number:

[
    1,
    2,
    3
  File '<stdin>', line 1-4, column 1-6
    [
    ^
jsonyx._decoder.JSONSyntaxError: Unterminated array

Summary of all extra information in error messages:

Ideally we should include this in the standard library, but a third-party library like jsonyx works just as well for me. It's just a bit annoying that I'll have to include this for every future project where I want to use JSON.

But we should definitely take our time to evaluate this.

nineteendo commented 3 months ago

Yeah, it wasn't correct, I forget to count the newline when it was selected. I also added universal newline support:

 def _get_err_context(doc: str, start: int, end: int) -> tuple[int, str, int]:
-   line_start: int = doc.rfind("\n", 0, start) + 1
+   line_start: int = max(doc.rfind("\n", 0, start), doc.rfind("\r", 0, start)) + 1
-   if (line_end := doc.find("\n", start)) == -1:
-       line_end = len(doc)
+   if (match := _match_line_end(doc, start)):
+       line_end: int = match.end()
+   else:
+       line_end = start

     end = min(max(start + 1, line_end), end)
     max_chars: int = get_terminal_size().columns - 4  # leading spaces
+    if end == line_end + 1:  # newline
+        max_chars -= 1

     text_start: int = max(min(
         line_end - max_chars, end - max_chars // 2 - 1, start - max_chars // 3,
     ), line_start)
     text_end: int = min(max(
         line_start + max_chars, start + max_chars // 2 + 1,
         end + max_chars // 3,
     ), line_end)
     text: str = doc[text_start:text_end].expandtabs(1)
     if text_start > line_start:
         text = "..." + text[3:]

     if len(text) > max_chars:
         end -= len(text) - max_chars
         text = text[:max_chars // 2 - 1] + "..." + text[-max_chars // 2 + 2:]

     if text_end < line_end:
         text = text[:-3] + "..."

     return start - text_start + 1, text, end - text_start + 1

I have 22 tests for this function!

@pytest.mark.parametrize(
    ("columns", "doc", "start", "end", "offset", "text", "end_offset"), {
        # Only current line
        (7, "current", 0, 7, 1, "current", 8),
        #    ^^^^^^^             ^^^^^^^
        (12, "current\nnext", 0, 7, 1, "current", 8),
        #     ^^^^^^^                   ^^^^^^^
        (12, "current\rnext", 0, 7, 1, "current", 8),
        #     ^^^^^^^                   ^^^^^^^
        (12, "current\r\nnext", 0, 7, 1, "current", 8),
        #     ^^^^^^^                     ^^^^^^^
        (16, "previous\ncurrent", 9, 16, 1, "current", 8),
        #               ^^^^^^^              ^^^^^^^
        (16, "previous\rcurrent", 9, 16, 1, "current", 8),
        #               ^^^^^^^              ^^^^^^^
        (16, "previous\r\ncurrent", 10, 17, 1, "current", 8),
        #                 ^^^^^^^               ^^^^^^^

        # No newline
        (17, "start-middle-end", 0, 5, 1, "start-middle-end", 6),
        #     ^^^^^                        ^^^^^
        (8, "current\nnext", 0, 12, 1, "current", 8),
        #    ^^^^^^^^^^^^^              ^^^^^^^
        (8, "current\rnext", 0, 12, 1, "current", 8),
        #    ^^^^^^^^^^^^^              ^^^^^^^
        (8, "current\r\nnext", 0, 13, 1, "current", 8),
        #    ^^^^^^^^^^^^^                ^^^^^^^

        # Newline
        (8, "current", 7, 8, 8, "current", 9),
        #           ^                   ^
        (8, "current\nnext", 7, 12, 8, "current", 9),
        #           ^^^^^^                     ^
        (8, "current\rnext", 7, 12, 8, "current", 9),
        #           ^^^^^^                     ^
        (8, "current\r\nnext", 7, 13, 8, "current", 9),
        #           ^^^^^^                      ^

        # Expand tabs
        (8, "\tcurrent", 1, 8, 2, " current", 9),
        #      ^^^^^^^              ^^^^^^^

        # Truncate start
        (7, "start-middle-end", 5, 6, 4, "...-...", 5),  # end
        #         ^                          ^
        (12, "start-middle-end", 7, 11, 5, "...middle...", 9),  # start
        #            ^^^^                       ^^^^
        (6, "start-middle-end", 13, 16, 4, "...end", 7),  # line_end
        #                 ^^^                  ^^^
        (7, "start-middle-end", 16, 17, 7, "...end", 8),  # newline
        #                    ^                    ^

        # Truncate middle
        (13, "start-middle-end", 0, 16, 1, "start...e-end", 14),
        #     ^^^^^^^^^^^^^^^^              ^^^^^^^^^^^^^

        # Truncate end
        (8, "start-middle-end", 0, 5, 1, "start...", 6),  # line_start
        #    ^^^^^                        ^^^^^
        (7, "start-middle-end", 5, 6, 4, "...-...", 5),  # start
        #         ^                          ^
        (12, "start-middle-end", 7, 11, 5, "...middle...", 9),  # end
        #            ^^^^                       ^^^^
    },
)
# pylint: disable-next=R0913
def test_err_context(  # noqa: PLR0913, PLR0917
    monkeypatch: pytest.MonkeyPatch, columns: int, doc: str, start: int,
    end: int, offset: int, text: str, end_offset: int,
) -> None:
    """Test error context."""
    monkeypatch.setenv("COLUMNS", str(4 + columns))  # leading spaces
    exc: JSONSyntaxError = JSONSyntaxError("", "", doc, start, end)
    assert exc.offset == offset
    assert exc.text == text
    assert exc.end_offset == end_offset

I don't think many other people would spent so much time on nice error messages...

nineteendo commented 3 months ago

The library is finally ready for production: https://jsonyx.readthedocs.io.

ArielRotem1 commented 2 months ago

Hi, as @gaogaotiantian said, the python community has discussed the suggestion in the following link: https://discuss.python.org/t/jsondecodeerror-exceptions-message-verbose/58472 How we advanced from here toward implementing the feature?

nineteendo commented 2 months ago

There's no consensus on Discourse.

This doesn't need to be added to Python, a third party-library can provide this functionality just as well. If you want to make use of this right now you can pip install jsonyx. What DOES need to be implemented by Python is displaying the end line number and start and end column number in the traceback, so that would be worth discussing.

If you do want this feature to be added to Python, I suggest creating an issue (and mentioning me) on https://github.com/simplejson/simplejson and see how that is received.