Open ArielRotem1 opened 4 months ago
I believe this is something that should be posted in https://discuss.python.org/c/ideas/6 first. There are a lot of things to be discussed:
The idea should be mature and concrete (preferably discussed) before presented in issues :)
What a coincidence! I've implemented this a few days ago in my fork of the json library:
def _get_err_context(doc: str, pos: int) -> tuple[int, str]:
line_start: int = doc.rfind("\n", 0, pos) + 1
if (line_end := doc.find("\n", pos)) == -1:
line_end = len(doc)
max_chars: int = get_terminal_size().columns - 4 # leading spaces
min_start: int = min(line_end - max_chars, pos - max_chars // 2)
max_end: int = max(line_start + max_chars, pos + max_chars // 2)
text_start: int = max(min_start, line_start)
text_end: int = min(line_end, max_end)
text: str = doc[text_start:text_end]
if text_start > line_start:
text = "..." + text[3:]
if text_end < line_end:
text = text[:-3] + "..."
offset: int = pos - text_start + 1
return offset, text
class JSONDecodeError(SyntaxError):
"""JSON decode error."""
def __init__(self: Self, msg: str, doc: str, pos: int, filename: str = "<string>") -> None:
"""Create new JSON decode error."""
lineno: int = doc.count("\n", 0, pos) + 1
self.colno: int = pos - doc.rfind("\n", 0, pos)
offset, text = _get_err_context(doc, pos)
super().__init__(msg, (filename, lineno, offset, text))
def __str__(self: Self) -> str:
return f"{self.msg} (file {self.filename}, line {self.lineno:d}, column {self.colno: d})"
Result:
>>> import jsonc
>>> jsonc.loads("[,]")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/storage/emulated/0/Download/pyvz2-alpha/pyvz2/jsonc/__init__.py", line 124, in loads
return JSONDecoder(allow=allow).decode(s, filename=filename)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/storage/emulated/0/Download/pyvz2-alpha/pyvz2/jsonc/decoder.py", line 324, in decode
raise JSONSyntaxError(msg, filename, s, err.value) from None
File "<string>", line 1
[,]
^
jsonc.scanner.JSONSyntaxError: Expecting value
Ideally the column number should be included in the traceback (and the truncation should happen there):
Then it would look even better:
$ python pyvz2 json
[,]
File '<stdin>', line 1, column 2
[,]
^
jsonc.scanner.JSONSyntaxError: Expecting value
I plan to discuss this in more detail in September.
@gaogaotiantian Thank you for letting me know! It makes more sense to go through reviews of the idea and solidify it.
@nineteendo Wow! Nice to hear! It helps to prove that the idea has a meaningful usage. I would create a post to discuss the idea with the community. I would love to hear there your support and what you did and why.
Sure, but I can't post on Discourse... Can I summarize here?
Sure, you can post here. I can send a link there to your comment. But of course it would be better if you could post there.
(I'm banned)
During a rewrite, I ran into an encoding issue that was impossible to figure out with the current error message:
>>> import json
>>> "{}".encode("utf_16") # redirect stdout
b'\xff\xfe{\x00}\x00'
>>> b'\xff\xfe{\x00}\x00'.decode("cp1252") # redirect stdin
'ÿþ{\x00}\x00'
>>> json.loads('ÿþ{\x00}\x00')
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I had to resort to print()
for debugging. After fixing it, I had the idea to display it like syntax errors:
jsonc.scanner.JSONSyntaxError: Expecting value
File "C:\Users\wanne\empty.json", line 1, column 1
ÿþ{
^
You can even open the file at the location of the error by hovering over the filename in VS Code! It was relatively easy to derive the filename from the open file and pass it around:
def load(fp):
return loads(fp.read(), filename=getattr(fp, "name", "<string>"))
def loads(s, *, filename="<string>"):
if not filename.startswith("<") and not filename.endswith(">"):
filename = realpath(filename)
Unlike python files, lines in a JSON file can get very long, making the error hard to read:
jsonc.scanner.JSONSyntaxError: Expecting ':' delimiter
File '<stdin>', line 1, column 371
{"glossary": {"title": "example glossary", "GlossDiv": {"title": "S", "G
lossList": {"GlossEntry": {"ID": "SGML", "SortAs": "SGML", "GlossTerm": "Sta
ndard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1
986", "GlossDef": {"para": "A meta-markup language, used to create markup la
nguages such as DocBook.", "GlossSeeAlso": ["GML", "XML"]}, "GlossSee"}}}}}
^
So I tried only displaying 40 characters before and after:
jsonc.scanner.JSONSyntaxError: Expecting ':' delimiter
File '<stdin>', line 1, column 371
...SeeAlso": ["GML", "XML"]}, "GlossSee"}}}}}
^
But that left a lot of unused space, so I made it dynamic:
jsonc.scanner.JSONSyntaxError: Expecting ':' delimiter
File '<stdin>', line 1, column 371
...s such as DocBook.", "GlossSeeAlso": ["GML", "XML"]}, "GlossSee"}}}}}
^
It didn't look quite right in tracebacks, so I inherited from SyntaxError
:
File "<string>", line 1
^
jsonc.scanner.JSONSyntaxError: Expecting value
Sadly, we lose the column information now, so traceback
should be modified. But other than that, it looks great.
You can install jsonyx for testing purposes: https://pypi.org/project/jsonyx
It can be run from the command line with python -m jsonyx.tool
.
IMPORTANT: Don't use it in production code, the API is unstable and is insufficiently tested.
As stated previously, you get the information for free in tracebacks:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.12/site-packages/jsonyx/__init__.py", line 247, in loads
return JSONDecoder(allow=allow).loads(s, filename=filename)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/jsonyx/__init__.py", line 121, in loads
return self._scanner(filename, s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 1
[,]
^
jsonyx._decoder.JSONSyntaxError: Expecting value
And you can format the exception like this:
>>> import jsonyx, traceback
>>> exc = jsonyx.JSONSyntaxError("Expecting value", "<string>", "[,]", 1)
>>> print(*traceback.format_exception_only(exc), sep="", end="")
File "<string>", line 1
[,]
^
jsonyx._decoder.JSONSyntaxError: Expecting value
@elisbyberi, what do you think of my approach? Imagine if syntax errors looked like this:
>>> [,]
SyntaxError: invalid syntax: line 1 column 2 (char 2)
Instead of this:
>>> [,]
File "<stdin>", line 1
[,]
^
SyntaxError: invalid syntax
Ideally, answer on the Discourse thread, I can't post there.
Could you show Chris what the simple approach could look like for indented JSON in a terminal?
File '<stdin>', line 1, column 371
{"glossary": {"title": "example glossary", "GlossDiv": {"title": "S", "G
lossList": {"GlossEntry": {"ID": "SGML", "SortAs": "SGML", "GlossTerm": "Sta
ndard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1
986", "GlossDef": {"para": "A meta-markup language, used to create markup la
nguages such as DocBook.", "GlossSeeAlso": ["GML", "XML"]}, "GlossSee"}}}}}
^
jsonc.scanner.JSONSyntaxError: Expecting ':' delimiter
Also, it's worth noting that including the filename in the error message can be extremely useful.
I'm going to try to underline more than one character if applicable.
Could you show Chris what the simple approach could look like for indented JSON in a terminal?
File '<stdin>', line 1, column 371 {"glossary": {"title": "example glossary", "GlossDiv": {"title": "S", "G lossList": {"GlossEntry": {"ID": "SGML", "SortAs": "SGML", "GlossTerm": "Sta ndard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1 986", "GlossDef": {"para": "A meta-markup language, used to create markup la nguages such as DocBook.", "GlossSeeAlso": ["GML", "XML"]}, "GlossSee"}}}}} ^ jsonc.scanner.JSONSyntaxError: Expecting ':' delimiter
Why the caret signaling where the error is so far from the data itself? Additionally, the caret seems to change location according to the screen size of the device you're using.
There are 374 spaces before the caret:
File '<stdin>', line 1, column 371
{"glossary": {"title": "example glossary", "GlossDiv": {"title": "S", "GlossList": {"GlossEntry": {"ID": "SGML", "SortAs": "SGML", "GlossTerm": "Standard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1986", "GlossDef": {"para": "A meta-markup language, used to create markup languages such as DocBook.", "GlossSeeAlso": ["GML", "XML"]}, "GlossSee"}}}}}
^
jsonyx._decoder.JSONSyntaxError: Expecting ':' delimiter
Which get wrapped in the terminal (which is dependent of the screen size):
Instead of truncating the text:
Could you mention jsonyx
in the thread? I'm getting a little annoyed about problems being mentioned that are already handled by it. Especially the "Expecting value" is quite undescriptive.
Yes, I will do it, but it is pretty inconveniet to handle a conversation like this. So if you can do anything to join the fourm again it would be better.
Using my second GitHub account is a very bad idea, and the chance of getting unbanned early is low. The library should suffice for the discussion.
I can now underline multiple characters:
Here's the code to get the error context, supporting multiple characters (which provides useful information in error messages):
def _get_err_context(doc: str, start: int, end: int) -> tuple[int, str, int]:
line_start: int = doc.rfind("\n", 0, start) + 1
if (line_end := doc.find("\n", start)) == -1:
line_end = len(doc)
end = min(max(start + 1, line_end), end)
max_chars: int = get_terminal_size().columns - 4 # leading spaces
text_start: int = max(min(
line_end - max_chars, end - max_chars // 2 - 1, start - max_chars // 3,
), line_start)
text_end: int = min(max(
line_start + max_chars, start + max_chars // 2 + 1,
end + max_chars // 3,
), line_end)
text: str = doc[text_start:text_end].expandtabs(1)
if text_start > line_start:
text = "..." + text[3:]
if len(text) > max_chars:
end -= len(text) - max_chars
text = text[:max_chars // 2 - 1] + "..." + text[-max_chars // 2 + 2:]
if text_end < line_end:
text = text[:-3] + "..."
return start - text_start + 1, text, end - text_start + 1
This isn't trivial at all. (not sure if it's even correct)
I can now also mention the end line and end column number:
[
1,
2,
3
File '<stdin>', line 1-4, column 1-6
[
^
jsonyx._decoder.JSONSyntaxError: Unterminated array
Summary of all extra information in error messages:
Ideally we should include this in the standard library, but a third-party library like jsonyx
works just as well for me.
It's just a bit annoying that I'll have to include this for every future project where I want to use JSON.
But we should definitely take our time to evaluate this.
Yeah, it wasn't correct, I forget to count the newline when it was selected. I also added universal newline support:
def _get_err_context(doc: str, start: int, end: int) -> tuple[int, str, int]:
- line_start: int = doc.rfind("\n", 0, start) + 1
+ line_start: int = max(doc.rfind("\n", 0, start), doc.rfind("\r", 0, start)) + 1
- if (line_end := doc.find("\n", start)) == -1:
- line_end = len(doc)
+ if (match := _match_line_end(doc, start)):
+ line_end: int = match.end()
+ else:
+ line_end = start
end = min(max(start + 1, line_end), end)
max_chars: int = get_terminal_size().columns - 4 # leading spaces
+ if end == line_end + 1: # newline
+ max_chars -= 1
text_start: int = max(min(
line_end - max_chars, end - max_chars // 2 - 1, start - max_chars // 3,
), line_start)
text_end: int = min(max(
line_start + max_chars, start + max_chars // 2 + 1,
end + max_chars // 3,
), line_end)
text: str = doc[text_start:text_end].expandtabs(1)
if text_start > line_start:
text = "..." + text[3:]
if len(text) > max_chars:
end -= len(text) - max_chars
text = text[:max_chars // 2 - 1] + "..." + text[-max_chars // 2 + 2:]
if text_end < line_end:
text = text[:-3] + "..."
return start - text_start + 1, text, end - text_start + 1
I have 22 tests for this function!
@pytest.mark.parametrize(
("columns", "doc", "start", "end", "offset", "text", "end_offset"), {
# Only current line
(7, "current", 0, 7, 1, "current", 8),
# ^^^^^^^ ^^^^^^^
(12, "current\nnext", 0, 7, 1, "current", 8),
# ^^^^^^^ ^^^^^^^
(12, "current\rnext", 0, 7, 1, "current", 8),
# ^^^^^^^ ^^^^^^^
(12, "current\r\nnext", 0, 7, 1, "current", 8),
# ^^^^^^^ ^^^^^^^
(16, "previous\ncurrent", 9, 16, 1, "current", 8),
# ^^^^^^^ ^^^^^^^
(16, "previous\rcurrent", 9, 16, 1, "current", 8),
# ^^^^^^^ ^^^^^^^
(16, "previous\r\ncurrent", 10, 17, 1, "current", 8),
# ^^^^^^^ ^^^^^^^
# No newline
(17, "start-middle-end", 0, 5, 1, "start-middle-end", 6),
# ^^^^^ ^^^^^
(8, "current\nnext", 0, 12, 1, "current", 8),
# ^^^^^^^^^^^^^ ^^^^^^^
(8, "current\rnext", 0, 12, 1, "current", 8),
# ^^^^^^^^^^^^^ ^^^^^^^
(8, "current\r\nnext", 0, 13, 1, "current", 8),
# ^^^^^^^^^^^^^ ^^^^^^^
# Newline
(8, "current", 7, 8, 8, "current", 9),
# ^ ^
(8, "current\nnext", 7, 12, 8, "current", 9),
# ^^^^^^ ^
(8, "current\rnext", 7, 12, 8, "current", 9),
# ^^^^^^ ^
(8, "current\r\nnext", 7, 13, 8, "current", 9),
# ^^^^^^ ^
# Expand tabs
(8, "\tcurrent", 1, 8, 2, " current", 9),
# ^^^^^^^ ^^^^^^^
# Truncate start
(7, "start-middle-end", 5, 6, 4, "...-...", 5), # end
# ^ ^
(12, "start-middle-end", 7, 11, 5, "...middle...", 9), # start
# ^^^^ ^^^^
(6, "start-middle-end", 13, 16, 4, "...end", 7), # line_end
# ^^^ ^^^
(7, "start-middle-end", 16, 17, 7, "...end", 8), # newline
# ^ ^
# Truncate middle
(13, "start-middle-end", 0, 16, 1, "start...e-end", 14),
# ^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^
# Truncate end
(8, "start-middle-end", 0, 5, 1, "start...", 6), # line_start
# ^^^^^ ^^^^^
(7, "start-middle-end", 5, 6, 4, "...-...", 5), # start
# ^ ^
(12, "start-middle-end", 7, 11, 5, "...middle...", 9), # end
# ^^^^ ^^^^
},
)
# pylint: disable-next=R0913
def test_err_context( # noqa: PLR0913, PLR0917
monkeypatch: pytest.MonkeyPatch, columns: int, doc: str, start: int,
end: int, offset: int, text: str, end_offset: int,
) -> None:
"""Test error context."""
monkeypatch.setenv("COLUMNS", str(4 + columns)) # leading spaces
exc: JSONSyntaxError = JSONSyntaxError("", "", doc, start, end)
assert exc.offset == offset
assert exc.text == text
assert exc.end_offset == end_offset
I don't think many other people would spent so much time on nice error messages...
The library is finally ready for production: https://jsonyx.readthedocs.io.
Hi, as @gaogaotiantian said, the python community has discussed the suggestion in the following link: https://discuss.python.org/t/jsondecodeerror-exceptions-message-verbose/58472 How we advanced from here toward implementing the feature?
There's no consensus on Discourse.
This doesn't need to be added to Python, a third party-library can provide this functionality just as well.
If you want to make use of this right now you can pip install jsonyx
. What DOES need to be implemented by Python is displaying the end line number and start and end column number in the traceback, so that would be worth discussing.
If you do want this feature to be added to Python, I suggest creating an issue (and mentioning me) on https://github.com/simplejson/simplejson and see how that is received.
Feature or enhancement
Proposal:
Example is the best way to start:
The problem: Currently, when getting a JSONDecodeError you can't fully understand where the issue is because of unindecative error message. The message only gives you the location of the error but not what actually caused the error itself. For example, when you try to
json.loads()
a response from a server and you don't know what the content will be. If you got aJSONDecodeError
and you didn't log or save the raw response fron the server, you won't be able to figure out what caused the error and fix it.The solution First, I am not here to propose one unique solution, there could be multiple possible solutions. I just want to present the simple one to show its impact.
The solution I proposed in the example makes use of new function inside the JSONDecodeError class. The
verbose
function prints the actual characters in which the error occurred and print a range of chars around them to get a context of the error. Furthermore, the function will be able to get a parameter which will defines the range of characters around the error to get back.I would very like to hear your opinions of this feature. Thanks!
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response